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Proceed with caution 


Proposed molecular testing of a person’s age highlights difficult questions for scientists 


and society. 


Disney pretended to be older to sign up for war. All sorts of 

people lie about their age for all sorts of reasons, and they've 
been doing so for a long time. Scandals over the past decade have 
forced authorities to act on false claims from footballers: the world 
governing body FIFA now routinely scans the wrist bones of players 
entering youth competitions to check that the athletes are truly young 
enough to compete. 

Wrist scanning is one of several anatomical tests available that 
claim to be able to determine an individual’s maturity. They are cum- 
bersome and unreliable. More-accurate tests are on the way. But the 
implications are profound, and must be discussed by researchers, 
policymakers and the public. 

One reason that scientists are trying to improve age testing is 
to address controversy in Europe over the age of refugees. United 
Nations rules say that those under 18 years old must receive particular 
protection and assistance. Some adults say they are younger than they 
are to claim these benefits, and tabloid anger over these rare cases has 
fuelled political and public intolerance. 

If officials who assess age on the basis of physical attributes such as 
height, voice and facial features suspect that a refugee is concealing their 
true age, they can apply tests that assess the maturity of teeth or bones. 

Experts are right to condemn these techniques as being unreliable 
and to complain they could deny vulnerable children the help 
afforded to them in national laws. Could a more-reliable and evi- 
dence-based test help? Such a test would not answer all the ques- 
tions posed by Europe’s refugee crisis; almost four million people 
have claimed asylum since 2014, sparking a rise of xenophobia in 
parts of society and creating difficult decisions about how best to help 
immigrants and who to help most. But accurate tests could, in theory, 
help to make the basis of these decisions objective and transparent. 

As we report in a News story (see page 15), some scientists say that 
age could be determined more accurately using a molecular test called 
an epigenetic clock. It looks for distinctive chemical marks that are 
known to steadily accumulate on DNA. In theory, the test could be 
performed on a simple cheek swab. The researchers developing it are 
confident that this method could reliably predict age to within one or 
two years — much better than the currently used anatomical tests, which 
can estimate only to within three or four years. But they also highlight 
that the epigenetic clock currently performs poorly for many people. 
And no biological test will ever be able to say for sure if a person is 17 
years, 11 months or 18 years old. 

Such assays must be subjected to extensive, rigorous testing across 
different populations, and their limitations must be made clear. Fur- 
thermore, the ethical implications should be fully debated before the 
tests are used to determine the age of refugees, which potentially has 
life-changing consequences. Such tests must always come with full 
consent and privacy safeguards. 

There are also important implications in other spheres. A more 


| } va Peron claimed she was younger for political reasons. Walt 


accurate way to determine age would be useful in forensics work, for 
example, to help build up a profile of a suspect from blood or semen. 
Success would depend on the sample being large enough for analy- 
sis. What’s more, countries such as Germany currently prohibit such 
information being extracted from DNA tests. 

Age fraud is a widespread problem in sport. In 2010, the discovery 
of such deception forced a Chinese gymnastics team to return the gold 
medals it had won at the Sydney Olympics a 
decade earlier. One competitor was declared 
to have been only 14 at the time of the com- 
petition, two years below the required age. 

Last week saw violent anti-immigrant 
demonstrations, this time in Chemnitz in 


"Researchers 
should not 
develop, or make 
claims for, age- 


dete ie eastern Germany. Even though the number 
tests without eee 

n of refugees arriving in Europe has plum- 
extreme care. 


meted — 49,000 had arrived by this July, 
compared with 1.3 million throughout 
2016 — tensions remain high. 

Given all this, researchers should not develop, or make claims for, 
age-determining tests without extreme care and wide discussion. 
That will take time: time for the rest of society to ponder how and 
whether such tests should be used. Age is not just a number. Much 
can be at stake. m 


Climate politics 


Global warming tops the Australian agenda as 
climate debates depose a third prime minister. 


change and finding a leader who can tackle it. Large swathes 

of the country are suffering the effects of a seven-year drought, 
the bush fire season has hit those parts two months early, and the 
destruction of the Great Barrier Reef grows more severe each year. Yet 
late last month, the country’s attempts to make some modest changes 
to its energy policy to help reduce greenhouse-gas emissions blew up 
an internal storm in the ruling Liberal party that cost Prime Minister 
Malcolm Turnbull his job. 

To lose one prime minister to political fights about climate-change 
policy is unfortunate. Two would be careless, but Turnbull is actually 
the third Australian premier to fall in this way in under a decade. What 
is going on? And what does this turmoil say about attempts to rein in 
damaging carbon emissions elsewhere? 

All politics is local, and Australian climate politics more so than 
most. Although Australian scientists are world leaders in several areas 


\ ustralia has two pressing environmental problems: climate 
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of climate science, including atmospheric monitoring of the Southern 
Hemisphere and understanding the causes of sea-level rise, the nation 
remains heavily reliant on coal for jobs and electricity. It mines more 
than half a billion tonnes of the stuff each year, and sells almost three- 
quarters of that abroad. The rest is burnt in Australian power stations, 
with electricity generation accounting for around one-third of the 
nation’s greenhouse-gas emissions. 

It's no coincidence that when Turnbull’s political colleague (and 
then-treasurer) Scott Morrison wanted to criticize environmentalists 
last year, he brought a lump of coal to parliament and spoke about it in 
glowing terms. Last week — after Turnbull confirmed he was quitting 
politics — his son complained about the “undue level of influence” of 
the coal lobby. Morrison, who replaced Turnbull as prime minister, 
has yet to announce the fate of the disputed policy, the National Energy 
Guarantee, which would force emissions generators to show they are 
meeting annual standards. He has at least said that the country will 
not withdraw from the Paris climate agreement, a move being pushed 
by some government members. 

He should stand firm. Although the Paris agreement is weak 
compared with the scale of what is needed, it represents a political 
triumph and one that places so few binding demands on nations that 
any withdrawal would be little more than crowd-pleasing theatrics. 
And most of the crowd won't be pleased: a June poll showed that 59% 
of Australians saw climate change as a pressing threat and one that 
needed action — the highest percentage in a decade. 

A larger-scale survey last year of 38 countries showed a similar level 
of concern. But politicians in many of these places, even those fully 
behind the need for action on emissions, are also finding it difficult to 
follow through on pledges. Take Canada, where Justin Trudeau's gov- 
ernment last month announced it was scaling back plans for a carbon 
tax. Last week, Nicolas Hulot, the French environment minister, 
resigned, claiming that governments around the world are not taking 


sufficient steps to tackle green issues such as climate change. And the 
reckless stance of US President Donald Trump continues to erode 
climate regulations and embolden climate sceptics. New Zealand, for 
one, still has ambitions for emissions-reducing laws, but many of the 
other promises the country made in Paris — including actual cuts to 
carbon emissions and boosts in foreign aid to help poorer countries 

adapt — are weakening under political pressure. 
Many of those poorer countries are on the front line and will suffer 
heavily as the weather worsens. So will Australia. Droughts there are 
projected to increase in length and sever- 


“Australia is ity as a result of climate change. Heatwaves, 
likely to face floods and bush fires are also linked to global 
an increase warming, and are predicted to become more 
in climate common and more extreme. The country’s 
refugees. = island neighbours in the Pacific are likely to 


be inundated as sea levels rise. As a result, 
Australia, whose draconian refugee policy is a source of shame to 
many citizens, is likely to face an increase in climate refugees. 

That these topics are now routinely debated amid mounting public 
concern about global warming is a victory of sorts for scientists, who 
must continue their efforts to make the case for action, and to research 
and speak out about the consequences. And although the current politi- 
cal drama in Australia paints a depressing picture, there is a glimmer 
of hope. A decade after the financial crash wrested away attention and 
momentum, climate change is once again at the top of the political 
agenda. 

Things can change quickly in politics, and Australia has a chance to 
force that change. Already the opposition Labor party has promised 
a new emissions-reduction scheme. And next year, the country will 
again vote on its leader. For whoever wins that election, curbing climate 
change should be at the top of their to-do list — and they must be given 
the chance to hang around long enough to do so. = 


What is Life? 


The lectures of physicist Erwin Schrédinger 
helped to change attitudes in biology. 


Dublin public to hear him deliver a series of lectures he described as 

“difficult” and that “could not be termed popular”. Some 400 people 
were undeterred and were among the first to hear Schrédinger offer his 
views on how physics could shed light on the puzzling ability of living 
organisms to maintain molecular order and organization in the face of 
what seemed to be the randomizing forces of nature. 

Seventy-five years on, some of his ideas remain difficult — contro- 
versial even. But they are popular, and are once again drawing people to 
the Irish capital. Trinity College Dublin will this week host ‘Schrédinger 
at 75 — The Future of Biology; at which a stellar cast of speakers will 
consider the future of disciplines ranging from ageing and plant science 
to infectious disease and consciousness. 

Schrédinger’s lectures were collected into what he called his “little 
book’, What Is Life?, published in 1944 (see Nature 560, 548-550; 
2018). Some consider it one of the most influential scientific books of 
the twentieth century. 

The book attracted scientists from other fields to the study of genetics 
and the molecular mechanisms of life, among them physicist Francis 
Crick and zoologist James Watson. But can the ideas in this slim volume 
really supply sufficient motivation for such a diverse programme? 

Critics have rightly argued that the book was neither particularly 
original nor up to date. Schrodinger made the auspicious proposal 
that the genetic material is an “aperiodic crystal”: a structure with a 
specific but not periodic arrangement of atoms, encoding information 


E the winter of 1943, the physicist Erwin Schrédinger invited the 


that somehow guides the development of the organism. That vision 
resonated with Crick and Watson as they contemplated the structure 
of DNA, but it wasn’t wholly original. As to how the genetic machinery 
works, Schrodinger could only point out that it seems to suspend the 
second law of thermodynamics. 

The impact of What Is Life? lies more in its spirit than its substance. 
Schrédinger presented the problem of life as a puzzle posed to no single 
discipline. And his timing was perfect: biology was already changing 
to a mechanistic and microscopic science. This cross-disciplinary 
relevance applies equally to the topics addressed at the Dublin meet- 
ing. The physical-sciences content of artificial intelligence and complex 
systems is obvious, but understanding of (say) cognitive neuroscience, 
learning and memory and infectious disease can also benefit from 
wide-ranging expertise: for example, from the study of network topolo- 
gies, the thermodynamics of information, and ergodicity (how widely 
a dynamic system explores its available states). 

Happily, chemistry is welcomed to this table too. That subject, after 
all, is what biologists relied on mid-century to probe and better under- 
stand DNA, enzymes and cell signalling. The subsequent emergence of 
molecular biology, due in large part to some of those inspired by What 
Is Life?, means that whether Nobel prizes get assigned to ‘chemistry’ or 
‘physiology or medicine’ is nowas arbitrary as whether Nobels in nuclear 
science in the early twentieth century were awarded in chemistry or 
physics. 

What Is Life? made the case that profound questions about the natural 
world aren't owned by any academic discipline. Indeed, the Dublin 
meeting could have gone further by embracing Schrédinger’s epilogue 
on determinism and free will, which invoked philosopher Immanuel 
Kant and Hinduism (and spoilt the book’s chances of publication in 
devoutly Catholic Ireland). Some eyebrows were raised at this material, 
but Schrédinger’s friend Albert Einstein would have seen nothing amiss 
in it. Philosophers, ethicists, poets and theologians also have a stake in 
the future of life. Perhaps they will be invited to the centenary. m 
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laboratory studying cognition at Columbia University in 

New York City, I kept returning to a particular concern: I 
would soon be responsible for the scientific advancement of trainees. 
How could I help them be the best scientists they could be, while also 
protecting their well-being? 

I found the answer on Twitter. Two principal investigators in my 
field, Jonathan Peelle at Washington University in St Louis, Missouri, 
and Maureen Ritchey at Boston College in Massachusetts, shared 
their lab manuals. These laid out expectations for themselves and 
their trainees, as well as resources and tips to guide trainees through 
their time in the lab. I decided to follow in their footsteps by writing 
alab manual to introduce my trainees to my philosophy for research 
and work-life balance. This required a great deal 
of time and thought, but it is something I would 
recommend to anyone leading a research group. 

In the final few months of my postdoctoral 
studies, I thought about what had worked well 
and not so well for me as a trainee, and how to 
create best practices for my lab. Then I put into 
writing things that are usually transmitted infor- 
mally. For example, that it doesn’t matter to me 
whether trainees arrive at 9 a.m. or 1 p.m. or work 
from home, as long as they get their work done 
and honour their commitments. And I explicitly 
encouraged trainees to talk to me if they need to 
vent or feel they are foundering: academia can be 
stressful, and I want to help. 

I addressed concerns that I imagined train- 
ees would have: what if I make a mistake in my 
experiment? (It’s OK, we all do; tell your collaborators right away so 
that you can start to discuss the next steps.) Do I have to work 80 hours 
a week to succeed? (No.) How do I ensure my results are reproduc- 
ible? (Double-check your code, add explanatory comments, document 
every step of data analysis and use version control.) How do I partici- 
pate in open science? (Publicly share stimuli, code and data when you 
submit a manuscript.) 

I supplemented my lab manual (go.nature.com/2cl dxdt) with a wiki 
(go.nature.com/2pti9kj), a website of resources for lab members. This 
included everything from tools for learning the programming lan- 
guages R and Python and how to do neuroimaging analyses, to tips on 
keeping up with the research literature (by using RSS feeds and Twitter) 
and where to find the best bagel in Manhattan (a ten-minute walk from 
the lab). My goal was that any newly accepted lab member could read 
the manual and wiki and then strut into the lab knowing what to expect. 

I try hard to keep myself accountable for what I have written. 
For instance, I promised weekly meetings with each trainee, and 
I stick to that, although it’s a challenge with teaching obligations 
and travel. I hope that the consistency between my actions and my 
words helps lab members to understand that I meant what I wrote, 


A year and a half ago, as I was preparing to launch my own 


MY TRAINEES DO NOT 
HAVE TO 
STRUGGLE 


TO FIND ANSWERS TO 
COMMONLY 


ASKED 
QUESTIONS. 


The Key to a happy lab life 
is in the manual 


A well-crafted set of guidelines and advice can save time, reassure trainees 
and promote a positive lab culture, argues Mariam Aly. 


even if they have yet to experience everything I promised. 

Iask every trainee to read the lab manual. I make a point of referenc- 
ing it and the wiki, along with repeated, not-so-subtle examples of their 
utility. My lab members now contribute to the wiki without prompt- 
ing. When I checked in with them to see whether these resources were 
useful, the answer was a resounding ‘yes. Their actions also suggest 
that they internalized what they read. Some share struggles with me, 
ask for advice and take days off for mental health — as I hoped they 
would, and as I wish I had done when I was a trainee. 

Here's another example: my lab manual states that trainees are enti- 
tled to read my grants, and my lab members have requested to see 
them. That’s something I never asked my previous advisers; I worried 
it would be presumptuous. I realize now that my thinking was almost 
certainly wrong, but my own uneasy feelings as 
a trainee just drive home how important it is to 
put into writing that something is OK — other- 
wise, trainees might assume it is not. That goes 
double for the areas that trainees are most sensi- 
tive about: I’ve written down in black and white 
that it is OK to make mistakes and to maintain a 
work-life balance. 

Putting together a lab manual and wiki takes 
time, but there are several examples to use for 
inspiration. My lab manual and wiki are publicly 
available for anyone to use as starting points. Once 
the wiki has been written, the entire lab can help to 
maintain it; if everyone pitches in, any particular 
update will often take only a few minutes. 

The initial effort of writing a manual saves 
enormous amounts of time in the long run. I no 
longer have to repeatedly search my e-mails or the Internet to find 
the answer to a problem I previously solved but have forgotten. Like- 
wise, my trainees do not have to struggle to find answers to commonly 
asked questions (for example, ‘how do I get after-hours access to the 
building?’). More importantly, having a lab manual requires you to be 
explicit and transparent about your expectations and what you promise 
to do for your lab — every trainee reads the same expectations in the 
same words, putting everyone on equal footing. 

A year after writing the lab manual, I re-read and revised it. That 
process reminded me of all that was at stake: all that I promised my 
trainees and all that I needed to do to ensure a healthy, happy and safe 
lab environment. It also led me to reflect on how pleased I am with 
my lab. My trainees are hard-working, sociable and supportive of one 
another. I love walking in and seeing them working together on a 
problem, or laughing and dancing when they’ve solved one. I might 
have written the lab manual, but my trainees brought it to life. m 


Mariam Aly is an assistant professor in the Department of Psychology 
at Columbia University in New York City. 
e-mail: ma3631@columbia.edu 
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STDs on the rise 


The incidence of several 
sexually transmitted diseases 
is rising steadily in the 

United States, the US Centers 
for Disease Control and 
Prevention (CDC) said on 

28 August. Nearly 2.3 million 
cases of gonorrhoea, syphilis 
and chlamydia were diagnosed 
last year, up from 2.1 million 
in 2016. The number of people 
diagnosed with gonorrhoea 
jumped by 67% between 2013 
and 2017; syphilis diagnoses 
rose by 76% and chlamydia 

by 22%. Without prompt 
treatment, these infections can 
cause infertility and stillbirth, 
and increase the risk that a 
person will contract HIV. The 
CDC also says that the bacteria 
that cause gonorrhoea are 
becoming more resistant to 
antibiotics. 


Carbon-free energy 


The California state legislature 
passed a bill on 29 August 

that would require the state to 
generate 100% of its electricity 
from carbon-free sources by 
2045. It would also increase 
the state's existing carbon-free- 
electricity mandate for 2030 
from 50% to 60%. Governor 
Jerry Brown must now sign 
the bill before it can become 
law. Once the regulation 

is enacted, California will 
become the second US state 

to establish such a policy. 
Hawaii was the first, enacting 
a similar mandate in 2015. 
Massachusetts, New Jersey, 
New York and Washington DC 
are also considering carbon- 
free-electricity mandates. 


Bullying probe 

An investigation into 
allegations of bullying is 
under way at the prestigious 
Wellcome Sanger Institute 


Brazil's national museum gutted by fire 


A huge fire devastated Brazil’s National 
Museum in Rio de Janeiro on the evening of 

2 September. Many of the archaeological finds 
and historical memorabilia — some 20 million 
items in all — are now feared to have been 
destroyed; museum officials told local media 
that as little as 10% of the collection might 
have survived. The 200-year-old building 
housed several landmark collections, including 
Egyptian and Greco-Roman artefacts and the 
oldest human-skull fossil found in the Western 


Hemisphere. It’s not yet clear what caused 

the fire, but the lack of a sprinkler system, the 
dilapidated state of the building and the failure 
of the two fire hydrants closest to the museum 
have all been blamed for the extent of the 
damage. Brazil's education minister, Rossieli 
Soares, told reporters on 3 September that the 
federal government would spend an initial 15 
million reais (US$3.6 million) to restore the 
structure and rebuild its collection, and would 
seek international help. 


in Hinxton, UK. According 
to The Guardian, which 
reported on the investigation 
on 29 August, ten former 

and current staff members 
have accused leaders at 

the institute of bullying, 
mistreatment of staffand 
gender discrimination. The 
complaints include allegations 
against the institute's director, 
geneticist Michael Stratton. 
The investigation will “ensure 
full and proper exploration 
of these allegations’, an 
institute spokesperson 

said. The institute is owned 
by the Wellcome Trust, 
which unveiled a landmark 
anti-bullying policy in May. 
The trust said that it is aware 
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of the investigation at Sanger, 
and that it would “await the 
outcome of that process before 
commenting further”. 


Research-ship clash 
Japan’s foreign ministry 
lodged a complaint with the 
South Korean government 
on 28 August, after a Korean 
research ship was spotted 
near a group of small islands 
that both countries claim 

as their own. The islets, 
knownas Dokdo in Korea 
and Takeshima in Japan, are 
controlled by South Korea, 
but sit in a joint fishing zone 
where the nations’ claims also 
overlap. Japan asserts that the 
ship, Tamgu 20, was seen in 
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Japan's exclusive economic 
zone southwest of the islets, 
where foreign research vessels 
require permission to operate. 
According to Japan's coast 
guard, the ship — managed 
by South Korea's National 
Institute of Fisheries Science 
(NIFS) in Busan — dropped 

a black object into the sea and 
left the zone before noon. The 
NIEFS has confirmed that the 
ship was conducting marine 
research on that day, but gave 
no details of its location or 
activities. 


Salk lawsuit claims 
A California court has thrown 
out a retaliation claim ina 

gender-discrimination lawsuit 


RICARDO MORAES/REUTERS 


against the Salk Institute 

for Biological Studies. On 

30 August, a judge dismissed 
molecular biologist Beverly 
Emersons claim that the 
institute in La Jolla, California, 
let her contract expire in 

8 December 2017 because of 
the suit she had filed that July. 
The court also ruled that a 

“ key piece of evidence for the 
claim — an e-mail from Salk’s 
former president, Elizabeth 
Blackburn, suggesting that 
litigation might hurt Emerson's 
career — is confidential 
material that should not go 
before jurors. Emerson alleges 
that systemic bias at the 
institute limited her pay and 
professional advancement and 
blocked her from resources 
such as research funding. The 
gender-discrimination trial is 
scheduled for 7 December. 


PEOPLE 


French minister 
Francois de Rugy, speaker of 
the French National Assembly, 
was appointed France's 
environment minister on 

4 September. De Rugy is a 
former green-party politician 
who switched to President 
Emmanuel Macron’s party, 

La République En Marche, in 
last year’s legislative elections. 
His predecessor, Nicolas Hulot, 
resigned his post on 28 August 
during a dramatic live 
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interview on the radio station 


France Inter. Hulot (pictured), 
a popular environmental 
activist who became Macron's 
environment minister last May, 
had declined previous offers 

to join government, saying 
that he had more influence 

as an activist. Buta year after 
taking the job, Hulot expressed 
frustration with the slow pace 
of progress in politics, and said 
that short-term demands on 
government detracted from 

its ability to tackle long-term 
issues such as climate change 
and the decline of biodiversity 
and natural resources. 


Pe SPACE 
Mars-rover vigil 


NASA is waiting for a massive 
dust storm on Mars to ebb 
before attempting to waken the 
sleeping Opportunity rover, 
the agency said on 30 August. 
The 14-year-old rover went 
silent on 10 June, after the 
storm obscured the sunlight 


SOURCE: C. WAYANT ET AL. JAMA ONCOL. HTTPS://DOI. 
ORG/10.1001/JAMAONCOL.2018.3738 (2018). 


TREND WATCH | 


An analysis of cancer-drug trials 
published in leading journals 
finds that about one-third of US 
authors failed to fully disclose 
payments from trial sponsors. 
Researchers led by Cole Wayant, 
a meta-researcher at Oklahoma 


State University in Tulsa, searched 


for cancer drugs approved 

by the US Food and Drug 
Administration between January 
2016 and August 2017. They 
then looked for related clinical 
trials and identified the key 
publication for each trial. For any 
US physician-oncologist authors 
named ona study, Wayant’s team 
looked at whether the authors 
had received funding from a trial 


sponsor during the period of the 
trial. Under the US Affordable 
Care Act, such payments must be 
declared by drug companies in the 
publicly available Open Payments 
database. The researchers cross- 
referenced these payments with 
conflict-of-interest declarations 
made by the authors in the 
journal articles, and added up 

the total amounts disclosed and 
undisclosed for each article. 

Of the 344 oncologist authors 

for whom the researchers had 
data, 110 (32%) had not fully 
disclosed payments from the 
sponsor. Together, these authors 
received about US$217 million in 


payments. 


that the spacecraft needs to 
survive. When the dust abates, 
NASA will listen for signals 
from the rover, and send it 
messages, for at least 45 days. 
The agency will continue to 
listen in a more passive mode 
until at least January 2019. By 
then, the Martian summer 
will have given way to autumn 
at Opportunity’s landing 

site. That change in seasons 
could kick up windstorms 

and clear any dust coating the 
rover’s solar panels. The recent 
Martian dust storm is one of 
the most extensive ever seen on 
the red planet. 


Higgs decay 

The ATLAS and CMS 
experiments at the Large 
Hadron Collider (LHC) 

have observed a previously 
undetected way in which the 
Higgs boson can decay — into 
a particle called the bottom 
quark, and its antiparticle. The 
experiments, based at CERN, 
Europe’ particle-physics 
laboratory outside Geneva, 
Switzerland, discovered 

the Higgs, a key part of the 
mechanism that gives other 
particles their masses, in 
2012. LHC researchers have 
accumulated evidence of 

the particle decaying into a 
variety of products, following 
theoretical predictions, 


CANCER-DRUG PAYMENTS 


SEVEN DAYS | THIS WEEK | 


including decaying into two 
photons. In June, researchers 
revealed that they had also 
seen the Higgs interact with 
the top quark. The bottom- 
quark decay, announced on 
28 August, is expected by 
theory, but the signal had been 
difficult to single out from the 
many other processes that can 
also produce those particles. 


Retraction report 
A Chinese university has 
concluded that the authors of 
a controversial gene-editing 
paper that was later retracted 
did not intend to deceive 

the scientific community. 
The paper detailed how an 
enzyme called NgAgo could 
edit genomes with similar 
accuracy to the CRISPR-Cas9 
gene-editing system (F. Gao 
et al. Nature Biotechnol. 34, 
768-773; 2016). But the 
paper's main finding was 
within months challenged 

by scientists who failed to 
reproduce the results. On 

2 August last year, the authors 
agreed to retract the paper. 
Last week, Hebei University 
of Science and Technology 

in Shijiazhuang announced 
that its investigation found 
no basis for thinking that the 
original experiments should 
be republished. (Nature’s 
news team is editorially 
independent of Nature 
Biotechnology.) 


One-third of 344 oncologist researchers involved in drug trials 
published in leading oncology journals failed to fully disclose payments 
from companies that sponsored the studies. 
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NEWSIN FOCUS 


European scientists 
seek ‘epigenetic clock’ to test 
age of refugees p.15 


wildfires spark scramble to 
improve models p.16 


Enormous 


Aradical open- 
access plan that could end 
journal subscriptions p.17 


Al techniques 
mine social media for clues 
about gun violence p.20 


Indian soldiers rescue residents from a flooded area in the southern state of Kerala. 


SUSTAINABILITY 


Kerala floods made worse 
by mining and dams 


Scientists say development boom in the Western Ghats mountains contributed to the disaster. 


BY T.V. PADMA 


orrential rains pounded southwest India 

in August, triggering devastating floods 

in the state of Kerala that have so far 

killed at least 483 people and forced hundreds 
of thousands from their homes. The monsoon 
rains have been heavier than usual, but scien- 
tists say that outdated dam-management sys- 
tems and increasing mining and development 
in the Western Ghats mountain range — a 


biodiversity hotspot that ecologists are trying 
to conserve — have exacerbated the disaster. 

Kerala received 758.6 millimetres of rain 
between 1 and 19 August — 2.6 times the aver- 
age for that time of year. The unusually heavy 
downpours caused rivers to overflow. Many 
of the fatalities were the result of landslides 
in rural areas, triggered by the massive down- 
pours. Authorities say the floods are the state's 
most damaging in 100 years. 

A contributing factor is that after the heavy 


rain, authorities began to release water from 
several of the state’s 44 dams, where reservoirs 
were close to overflowing. The neighbouring 
state of Tamil Nadu also purged water from its 
over-filled Mullaperiyar dam, which wreaked 
yet more havoc downstream in Kerala. 
Scientists say state governments often allow 
reservoirs to fill completely early in the mon- 
soon season, and do not release water slowly at 
regular intervals to prevent overfilling later in 
the season. “India’s reservoir management is 
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> unscientific, says meteorologist Madhavan 
Nair Rajeevan, secretary of India’s ministry of 
Earth sciences, which oversees the country’s 
meteorological institutes. Computer mod- 
els and meteorological forecasts are used in 
Europe and the United States to predict the 
rate at which water flows into reservoirs and 
how much water needs to be stored — but 
few authorities in India use such systems, says 
Rajeevan. He suggests that prediction systems 
should be introduced across India. 

An increase in development over the past 
two decades in the Western Ghats — a large 
mountain range that runs parallel to India’s 
west coast and across several states — might 
also have exacerbated the flooding, say several 
Indian scientists. “The land and soil distur- 
bances have triggered landslides and blocked 
streams, contributing to the floods,” says 
Madhav Gadgil, an ecologist at Goa University 
in Taleigao. 

In 2011, Gadgil headed a committee that 
investigated environmental damage from 
unsustainable development and illegal min- 
ing in the Western Ghats. The committee rec- 
ommended that the entire mountain range be 
declared “ecologically sensitive” — it contains 
30% of India’s plant, fish, bird and mammalian 
species — and that mining and the construc- 
tion of dams and coal-fired power plants be 
banned to conserve biodiversity. 

But the government ignored the report's rec- 
ommendations. Instead, in 2013, it accepted 
the advice of another committee, which sug- 
gested that only 37% of the Western Ghats be 
made off-limits to mining and construction. 

Gadgil says that the state governments have 
continued to approve infrastructure projects 
across the Ghats, including dams, power plants 
and buildings, many without reliable environ- 
mental-impact assessment reports. “There 
has been a proliferation of building and road 
construction,” he says. He adds that there’s also 
been an increase in illegal mining. 

Jason von Meding, who studies disaster risk- 
reduction at the University of Newcastle in 
Australia, says the government should explain 
why it rejected the Gadgil-committee report, 
which emphasized the need to curb develop- 
ment excesses and focus on conservation. 
“Uncontrolled mining, dam construction, 
deforestation and poorly planned construc- 
tion have multiplied the risk of flooding and 
landslides in recent years,” he says. 

Earth scientist Rajiv Sinha at the Indian Insti- 
tute of Technology Kanpur also points to an 
increase in the numbers of canals and bridges, 
which can reduce the width of rivers, leading 
toa build-up of sediment that slows water flow. 
After a sudden downpour, there is not enough 
space for the water, so it floods the surrounding 
area, “leading to disasters like the one we are 
witnessing in Kerala’, says Sinha. India’s poor 
infrastructure planning will be exacerbated 
by its vulnerability to extreme rainfall events, 
which are projected to happen more frequently 
asa result of global warming, he says. m 
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PUBLIC HEALTH 


Experimental Ebola 
drugs face tough test 


Researchers are devising a clinical-trial protocol to test 
three medicines in Africa’s latest outbreak. 


BY AMY MAXMEN 


ealth workers fighting the ongoing 
H Ebola outbreak in the Democratic 
Republic of the Congo (DRC) have 
given nearly 20 people experimental drugs 
to treat the virus since mid-August. But 
because the drugs have been dispensed on 
a case-by-case, ‘compassionate use’ basis, it 
is hard to know whether any are effective. 
Now, desperate to determine which therapy 
works best, researchers from the DRC and 
US governments, the World Health Organi- 
zation (WHO) and other groups are meet- 
ing to plan a clinical trial that will compare 
multiple drugs as the outbreak continues. 
For ethical reasons, the trial scientists do 
not intend to give any study participants 
a placebo. Instead, they hope to compare 
the two experimental medicines already 
in use to ZMapp, an antibody therapy that 
showed promise three years ago during a 
major Ebola epidemic in West Africa (The 
PREVAIL II Writing Group. N. Engl. J. Med. 
375, 1448-1456; 2016). Patients in the com- 
ing trial would receive one of these three 
drugs at random. The study design draws 
on a flexible clinical-trial framework that 
the WHO expects to unveil this week. The 
framework is intended for use in multiple 
Ebola outbreaks, to produce data that can be 


pooled over time. 
The scientists work- “Every trial 
ing onthe DRCtrial hasitsown 


hope to launch the 
effort in the coming 
weeks. “A clinical trial will give us the sci- 
entific evidence we need,’ says Jean-Jacques 
Muyembe-Tamfum, director-general of the 
National Institute for Biomedical Research in 
Kinshasa, which will lead the study. 

But planning for the trial is complicated by 
the realities of working in the DRC’s North 
Kivu and Ituri provinces, where fighting has 
killed more than 5 million people over the 
past two decades. Instability in the region 
could prevent clinicians from carrying out 
their work. “Armed groups can do what they 
want,’ Muyembe-Tamfum says. 

The current outbreak began on 1 August, 
and has grown to include 115 confirmed 
and probable cases of Ebola — including 
77 people who have died, the DRC health 


challenges.” 
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ministry said on 28 August. Public-health 
workers have vaccinated 4,645 people, and 
doctors have given 3 people the antiviral 
drug remdesivir, made by Gilead Sciences of 
Foster City, California. Another 13 patients 
have received mAb114, an experimental 
treatment derived from antibodies found in 
the blood of a person who contracted Ebola 
in 1995 and survived. 

That swift response is a major shift from 
the handling of the Ebola epidemic that 
struck West Africa in 2014. Experimental 
drugs were not used widely in West Africa 
then because there was no proof of their 
safety or efficacy — clinical trials did not 
begin until the outbreak was near its end. 
That delay helped to drive the death rate 
among Africans infected with Ebola to 
63%. But several Westerners infected with 
Ebola received the nascent therapies at top 
hospitals; the fatality rate for this group of 
patients was just 18%. This disparity eventu- 
ally prompted the WHO to develop guide- 
lines aimed at ensuring wider access to 
experimental treatments during future Ebola 
outbreaks. 

But the only way to determine how well a 
drug works is through a randomized, con- 
trolled clinical trial. Thus far, researchers 
have not managed to complete a trial of any 
experimental Ebola drug, because outbreaks 
of the disease have ended before enough 
patients enrolled. So the WHO has been 
working with international experts to create 
a basic trial design that can be adapted as data 
accumulate and logistical challenges change. 

Muyembe-Tamfum says that the trial 
being planned now will use that framework, 
and is likely to test mAb114, remdesivir and 
ZMapp, made by Mapp Biopharmaceutical 
in San Diego, California. 

Stationing a large number of medical 
professionals in an Ebola unit now is par- 
ticularly fraught because there are more than 
100 militias roving the eastern DRC. Ifarmed 
groups show up at a treatment centre, workers 
might flee rather than risk their lives — and 
any trial could stop. But Ana Maria Henao 
Restrepo, who helps to lead the WHO's 
Ebola Research and Development team, is 
unfazed. “Every trial has its own challenges,” 
she says. “That's why we are coming out with 
an approach that’s flexible.” m 


ak Re 


Around 4 million refugees have arrived in Europe since 2014, many of whom have no identity documents. 


asta 


DNA clock may aid 
refugee age check 


European forensic scientists want to find out whether 
epigenetics can help determine which refugees are under 18. 


BY ALISON ABBOTT 


hen authorities in Hildesheim, 
Germany, didn't believe an asylum 
seeker who claimed to be under 


18 years old — and thus eligible for privileged 
treatment — police turned to a blood test sold 
by Zymo Research in Irvine, California. 

The test uses chemical modifications to 
DNA that accrue over a lifetime, called an 
‘epigenetic clock; to determine a person's age. 
Scientists — aware of its potential benefits but 
also ofits current lack of precision — sounded 
an alarm. 

In a paper published’ in May, a team led 
by forensic-medicine specialist Stefanie 
Ritz-Timme of the University of Dusseldorf in 
Germany said that these tests were not ready 
for use in sensitive forensic evaluations. 

But now, in the charged political atmosphere 
that has accompanied the arrival of millions 
of refugees in Europe, researchers are joining 
forces to improve epigenetic-clock-based tests 
— with a focus on whether they might be used 
to help determine the age of refugees whose 
claims to be under 18 are disputed. 

“The race is now on to develop a more 
accurate clock that would be more predictive 


than the anatomical tests — and also more 
practical,” says cell biologist Wolfgang Wagner 
at the University of Aachen, Germany. 

The development of methods that could feed 
into decisions about who is granted asylum 
and how refugees are treated are likely to elicit 
criticism, says Denise Syndercombe-Court, a 
forensic geneticist at King’s College London. 
She says that some scientists, herself included, 
are wary of these efforts. 

But Niels Morling, a forensic geneticist at the 
University of Copenhagen who is running a 
national epigenetic-clock programme, defends 
the research. Given that the law treats those 
under 18 very differently from adults, he says, 
“you have a duty to make sure that it can be 
implemented fairly”. 

Philosopher Thomas Pogge, who specializes 
in global justice at Yale University in 
New Haven, Connecticut, says that, to keep 
rising anti-immigration sentiment in check, it 
is important for authorities to show that they 
can detect any refugees who pretend to be 
younger than they are. 

Since 2014, around 4 million refugees 
have arrived in Europe, many without iden- 
tity documents. Minor status usually leads 
to better care, an increased chance of being 
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granted asylum anda higher chance of gaining x 
permission to be joined by family members. 

Authorities say that some unaccompanied 
refugees claim to be younger than they are. But 
the anatomical tests that are currently used in 
some countries to assess age have an error 
range of up to 3-4 years and rely on X-rays 
and magnetic resonance imaging. In Norway, 
refugees have sued authorities for being forced 
to submit to these medical tests. 

The publication of the first reasonably 
accurate epigenetic clock in 2013 presented a 
simpler way of verifying age, because the test 
could be done using blood samples’. Devel- 
oped by biostatistician Steve Horvath at the 
University of California, Los Angeles, this clock 
measured an epigenetic mark called methyla- 
tion at 353 DNA sites across the genome. 

In July, Horvath and his team published a 
new clock that measures epigenetic marks at 
391 DNA sites’. It was particularly accurate in 
buccal cells scraped from the inside of the cheek, 
which are easier to collect than blood samples. 
Testing these cells from 53 people aged between 
3.5 and 18 years, he founda median error of just 
1.03 years. However, there were many outliers 
— people whose age could not be accurately 
predicted — and the most extreme result was 
out by 5 years and 8 months. Horvath expects 
that epigenetic clocks, once refined, will help 
refugees by corroborating their age claims. 
“At the same time, the tests may help identify 
individuals who break the law,’ he adds. 

In the Hildesheim case, Zymo, which bought 
an exclusive licence to Horvath’s test in 2016, 
compared the refugee’s sample with those of 
five others who had similar ethnic backgrounds 
and whose ages were known. Keith Booher, 
project manager for Zymos epigenetic services, 
told Nature that the test determined the most 
likely age of the person to be between 26 and 
29 years old. The Hildesheim authorities have 
declined to comment on the case. 

Forensic scientists around Europe are now 
working to make an epigenetic clock that 
would be more accurate and less expensive 
than what is available — and that is applicable 
to the easily collected buccal cells. Studies are 
also under way to gauge how the diverse ethnic 
backgrounds of Europe's refugees might influ- 
ence the epigenetic clock. Another challenge 
to using the clock is dealing with the people 
who are statistical outliers. Wagner thinks that 
some of these people, for whom the method 
simply wont work, could be identified using 
the right combinations of epigenetic markers, 
allowing scientists to discount the test’s results. 

Even if the accuracy of the epigenetic clock 
cannot be improved to rival anatomical tests, 
it would still be attractive, says Morling. “Every 
time you add a new method, you improve 
overall precision of age estimation.” = 
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RESEARCH CAPACITY 


World Bank 
invests in Africa 


Grassroots science initiative 
receives US$280 million. 


BY LINDA NORDLING 


World Bank scheme aimed at building 
As= capacity in Africa has 

announced a third, and probably final, 
investment worth at least US$280 million. The 
initiative loans money to African governments, 
and has set up 46 education and research centres 
in 17 African countries — but some worry what 
will happen once the bank’s money runs out. 

“I see a big challenge when the funding 
ends,’ says Patrick Ogwang, who leads a tradi- 
tional-medicine research centre funded by the 
initiative, at Mbarara University in Uganda. He 
is eyeing industry partnerships as a source of 
future cash, but says that competition is fierce. 

The World Bank launched the African Cen- 
tres of Excellence (ACE) initiative in 2014 with 
$165 million in loans; it created 22 centres in 
West and central African nations. Two years 
later, the bank approved $148 million to cre- 
ate 24 hubs in eastern and southern African 
countries. The third round, announced on 
31 August, pushes the bank’s total investment 
past $500 million. It again targets West and cen- 
tral Africa, and French development agency the 
AFD may add another $50 million. 

The centres focus on local research challenges 
such as plant breeding and infectious diseases, 
and have created jobs for hundreds of scientists 
and trained thousands of graduate students. 
Centres are eventually expected to sustain 
themselves with funding from governments, 
charities and industry. It’s important that the 
centres move towards this, says World Bank 
economist Andreas Blom, who leads the pro- 
gramme, but the third round of loans will offer 
“weaning off” funding for existing centres in 
West and central Africa, and pay for new ones. 

Critics of the scheme say that it has allowed 
governments to delay making substantive 
national investments in research. Govern- 
ments have 40 years to repay the money at low 
or zero interest. “Many African governments, 
with short political life spans, are not really con- 
cerned about who will pay and how,’ says John 
Mugabe, an expert on science policy in Africa at 
the University of Pretoria in South Africa. 

Representatives from the Ghanaian and 
Nigerian governments told Nature that the 
ACE loans complement their plans for national 
funding. In Ghana, higher-quality research pro- 
posals will reach the country’s national research 
fund thanks to the scheme, says Mohammed 
Salifu, executive secretary of Ghana's National 
Council for Tertiary Education. = 
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Firefighters battle a blaze near Redding, California, in July. 


Huge wildfires 
defy explanation 


Researchers scramble to improve wildfire models as blazes 
become larger and less predictable. 


BY JEFF TOLLEFSON 


wildfire on record continues to burn, 

fires are getting bigger and less predict- 
able — so much so that scientists are strug- 
gling to model them. Now, two research 
projects under way in the state are aim- 
ing to revamp the models that scientists, 
first responders and policymakers use to 
understand these dangerous and costly 
disasters. 

One, slated to wrap up in the next few 
months, looks at how specific environ- 
mental factors such as extreme winds 
affect fires. The other, officially launched 
on 30 August, focuses on how wildfires 
will change in the coming decades as the 
climate warms. 

“Something is definitely different, and it 
raises questions about how much we really 
know,” says Max Moritz, a fire scientist 
at the University of California, Santa 
Barbara. 

The efforts come against a backdrop of 


|: California, where the state’s largest 
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abnormal fire seasons around the world. 
The giant California fire has torched about 
166,000 hectares since late July, and con- 
tinues to burn in the northern part of the 
state. British Columbia in Canada is now 
experiencing its worst fire season on record 
(see ‘Scorched earth). And in late July, after 
weeks of intense 
heat and some of the 
lowest rainfall totals 
since the late nine- 
teenth century, offi- 
cials in Sweden were 
battling roughly 
50 wildfires across 

the country. 
Researchers have 
been at a loss to 
explain a flurry of unusual fire behaviour 
in California in recent years: wildfires that 
burn hot throughout the night instead of 
settling down, as many used to; blazes that 
race down hillsides faster than before; and 
fires that torch suburban neighbourhoods 
that were once considered safe from such 
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events. And, in July, a tornado with unprec- 
edented wind speeds of 230 kilometres per 
hour span up inside a fire near Redding, 
California. 

The problem, Moritz says, is that most 
of the fire models in use today are based on 
data from the past two or three decades. But 
it seems that fire behaviour might be shifting 
in response to climate faster than anybody 
expected, and that makes it increasingly 
problematic to extrapolate from past trends, 


he adds. 


BAD BEHAVIOUR 

“More frequent, extreme fire behaviour is 
actually sort of expected, but just saying that 
it’s going to happen isn’t enough,” says Dave 
Sapsis, who specializes in fire modelling and 
behaviour at the California Department of 
Forestry and Fire Protection (CAL FIRE), 
based in Sacramento. “We need to refocus 
some of our research efforts on characterizing 
the kinds of fire behaviour that cause us the 
most grief” 

As part of one of the projects, Sapsis is 
updating the model that CAL FIRE uses 
to map fire hazards across the state. In use 
since 2007, the model incorporates informa- 
tion about environmental conditions such 
as topography, fire history and the type of 
burnable vegetation in an area. But it doesn't 
capture how extreme winds can move 
through a local landscape. Those winds are 
the key to understanding urban conflagra- 
tions, Sapsis says. 

Within the next few months, he hopes to 
complete work on a detailed record of wind 
speed and direction across the entire state over 


SCORCHED EARTH 


Blazes in the United States and Canada have 
burnt nearly 4 million hectares this year — 
and fire seasons aren’t over yet. 


2018 wildfire sites 


the past 15 years. Those wind maps should 
help scientists to study recent fires and, ulti- 
mately, boost CAL FIRE’s ability to predict 
the risk of extreme fires in any given locality, 
Sapsis says. 

Climate scientists expect those risks to 
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increase in the coming decades. California's 
Fourth Climate Change Assessment, 
released on 27 August, projects that the area 
of land consumed by wildfires in the state 
each year could increase by 77% by 2100 if 
global greenhouse-gas emissions continue 
to rise. On average, more than 286,000 hec- 
tares have burnt each year over the past two 
decades. 


FUTURE ON FIRE 

The second project, a US$4-million study 
that includes Moritz and other scientists at 
multiple University of California campuses, 
will explore the future of fire, ecosystems and 
climate in California. Much of the existing 
research has focused on extrapolating from 
past trends. But this study is aiming to create 
a more realistic picture of how wildfires and 
ecosystems will evolve by integrating detailed 
models of fire behaviour, vegetation and cli- 
mate across the entire state. 

This should allow scientists to analyse 
how more-extreme and variable weather 
will affect wildfires and how ecosystems will 
respond to them, says Alex Hall, a climate 
scientist at the University of California, Los 
Angeles, and the project’s principle investi- 
gator. 

A lot of work has focused on tracking 
average fire trends, Sapsis says. But scientists 
need to improve their understanding of the 
extreme blazes, as well as how fire patterns 
could shift in the future, he adds. This will 
help government agencies and communities 
make better choices when it comes to manag- 
ing ecosystems and human developments in 
fire-prone areas. m 
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Radical plan to end paywalls 


Top European research funders announce ‘Plan S’ to make all scientific works free to read. 


BY HOLLY ELSE 


esearch funders from France, the 
R United Kingdom, the Netherlands 

and eight other European nations have 
unveiled a radical open-access initiative that 
could change the face of science publishing 
in just two years — and which has instantly 
provoked protest from publishers. 

The 11 agencies, which together spend 
€7.6 billion (US$8.8 billion) in research 
grants annually, say they will mandate that, 
from 2020, the scientists they fund make 
resulting papers free to read immediately on 
publication. The papers would have a liberal 
publishing licence that would allow anyone 
else to download, translate or otherwise reuse 
the work. “No science should be locked behind 


paywalls!” says a preamble document that 
accompanies the pledge, called Plan S, released 
on 4 September. 

“It is a very powerful declaration. It will be 
contentious and stir up strong feelings,” says 
Stephen Curry, a structural biologist and open- 
access advocate at Imperial College London. 
The policy marks a “significant shift” in the 
open-access movement, which has seen slow 
progress in its bid to make scientific literature 
freely available online. 

As written, Plan S would bar researchers 
from publishing in 85% of journals, including 
influential titles such as Nature and Science. 
According to a 2017 analysis, only around 15% 
of journals publish work immediately as open 
access (see ‘Publishing models’) — financed 
by charging per-article fees to authors or their 


funders, by negotiating general open-publishing 
contracts with funders, or through other means. 

More than one-third of journals still publish 
papers behind a paywall, and typically permit 
online release of free-to-read versions only after 
a delay of at least six months. And just less than 
half have adopted a ‘hybrid’ model of publish- 
ing, whereby they make papers immediately 
free to read for a fee ifan author wishes, but keep 
most studies behind paywalls. Under Plan S, 
however, scientists wouldnt be allowed to pub- 
lish in these hybrid journals, except during a 
short transition period. The plan also states that 
funders will cap the amount they are willing to 
pay for open-access publishing fees, but doesn't 
lay out what charge would be too much. 

The initiative is spearheaded by Robert-Jan 
Smits, the European Commission's special > 
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PUBLISHING MODELS 


Worldwide, the proportion of subscription-only journals* shrank between 2012 and 2016, 
giving way to more open-access (OA) and hybrid journals. 


Proportion of journals published 2012 | 


Subscription only 
49.2% 


Proportion of journals published 2016 


Delayed OA Open access 
2.1% 12.4% 
Hybrid 
36.2% 


Open access 
15.2% 


*From Scopus database. Hybrid journals are subscription titles that allow authors to make individual papers open for a fee. 


Subscription only Delayed OA 

37.7% 2.2% 
Hybrid 
45% 


Percentages do not add up to 100% because of rounding. 


> envoy on open access, and was launched 
by the advocacy group Science Europe (the 
‘S in Plan S can stand for ‘science, speed, solu- 
tion, shock, Smits says). National agencies in 
Austria, Ireland, Luxembourg, Norway, Poland 
and Slovenia have also signed, as have funders 
in Sweden and Italy. 

Smits says he took inspiration from the open- 
access policy of the Bill & Melinda Gates Foun- 
dation, the global health charity based in Seattle, 
Washington, which also demands immediate 


open-access publishing. Because Plan S forbids 
hybrid publishing — and because it involves 
multiple funders — its impacts could be more 
far-reaching than the Gates policy. 

Despite Smits’ role, the European Commis- 
sion hasnt itself signed the plan. But Smits says 
that he expects the requirements to be inte- 
grated into the terms of future research grants 
from the commission. He also expects more 
funding agencies to join, and says he will dis- 
cuss the plan in the United States next month. 


Asked for comments on the plan, publishers 
said they had serious concerns — particularly 
around the banning of hybrid journals. A 
spokesperson for the International Association 
of Scientific, Technical and Medical Publishers 
(STM) in Oxford, UK, which represents 145 
publishers, told Nature’s news team that it 
welcomed funders’ efforts to expand access to 
peer-reviewed scientific works, but that some 
sections of Plan S require careful consideration 
to avoid “any unintended limitations on aca- 
demic freedoms” In particular, the spokesper- 
son said, banning hybrid journals — which have 
broadened the availability of open-access arti- 
cles — could “severely slow down the transition” 

The publisher Elsevier said it supported the 
STM’s comments. A spokesperson for Springer 
Nature said: “We urge research funding agen- 
cies to align rather than act in small groups in 
ways that are incompatible with each other.’ 
Removing publishing options from researchers 
“fails to take this into account and potentially 
undermines the whole research publishing sys- 
tem’, the statement added. (Nature’s news team 
is editorially independent of its publisher.) 

Curry cautions that shifting from a subscrip- 
tion to an open-access business model around 
the world could also bring a new challenge — 
how scientists in poorer nations will be able to 
afford to publish open-access work. “That has 
to be part of the conversation,” he says. = 
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A WINDOW ONTO 
GUN VIOLENCE 


Gang-related shootings plague many US cities, and researchers 
are trying to tackle the problem using artificial intelligence. 


hooded gunman ambushed Gakirah Barnes 

on the streets of Chicago's South Side. A 
volley of bullets struck her in the chest, jaw and 
neck. The 17-year-old died in a hospital bed 
two hours later. 

To many, her death was just another grim 
statistic from a city that has been struggling 
with gun violence. Last year, around 3,500 peo- 
ple were shot in Chicago, Illinois, of which 246 
were aged 16 or younger; 38 of those children 
never celebrated another birthday. 

But Barnes's death was unusual for several 
reasons. She was a young woman in an epi- 
demic of violence that largely affects black 
men. She also had an Internet following. 
Barnes had a reputation asa ‘hitta — or killer 
— with rumours of at least two dead bodies 


| n the middle of the day on 11 April 2014, a 


BY ROD MCCULLOM 


to her credit. Although never charged with 
murder, she embraced the persona, posing in 
photos and videos with guns in her hands and 
making threats against rival gangs on Twitter. 
In a morbid modern irony, it’s likely that she 
revealed her location in real time to her killer 
through a tweet. Police have yet to charge 
anyone in connection with her murder. 
Desmond Upton Patton was sitting in his 
office at the University of Michigan in Ann 
Arbor when he first saw the headlines about 
Barnes. The social worker had been study- 
ing ‘Internet banging; or ‘cyberbanging, the 
use of social media by gang-involved youths 
to challenge, taunt or threaten rivals'. The 
online disputes can often spill out into the 
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streets as physical violence. 

Patton took a deep dive into Barnes’s 
archived Twitter timeline and discovered a 
treasure trove of social-media data — random 
thoughts as well as boasts, threats and violent 
imagery. But what surprised him most, he 
says, was the grief. “My pain ain't never been 
told” Barnes wrote after a friend was killed just 
weeks before her own death. 

What emerged from her timeline was a 
picture ofa teenage girl who lived in a commu- 
nity steeped in violence, who was deeply hurt by 
itand who wanted revenge. Now at the Colum- 
bia School of Social Work in New York City, 
Patton thinks that social-media histories such as 
that of Barnes can offer ways to identify young 
people at risk of being involved in gun violence. 
He assembled an interdisciplinary group of 


JON LOWENSTEIN/NOOR/EYEVINE 


researchers who use 
artificial-intelligence 
(AI) techniques to 
study the language 
and images in social- 
media posts to iden- 
tify patterns of grieving and anger. 

By developing tools to automatically 
recognize these telltale emotional signs, 
Patton hopes to provide a way for community 
organizations to intervene before digital fights 
turn deadly. Programmes in Chicago are start- 
ing to take notice. “Our violence-prevention 
outreach has to change because gangs have 
changed,” says Eddie Bocanegra, senior direc- 
tor of READI Chicago, an initiative aimed at 
reducing gun violence. 


Gang-related graffiti 
in Chicago, where 
around 3,500 people 
were shot last year — 
246 were minors. 


GUN VIOLENCE AND SOCIAL MEDIA 

The United States leads the world in gun 
violence against children. About 1,300 minors 
die in shootings each year and another 5,800 
are injured, according to researchers at the US 
Centers for Disease Control and Prevention’. 
Gun-related injuries are the third leading cause 
of death for children aged 17 and younger. And 
the death rate for African American children 
is ten times higher than for white and Asian 
American children. 

Patton and others have found that social 
media has exacerbated gun violence among 
young people and changed how gangs recruit 
members, conduct business and initiate 
violence. It has become an acute problem 
in Chicago, Patton says, where street gangs 
have splintered into small, unruly crews with 
younger members who often settle minor dis- 
putes — such as insults on social media — with 
deadly force. 

Violence and grief are common themes in 
Barnes’s online and offline history. Between 
December 2011 and her death in 2014, she 
tweeted nearly 27,000 times. She adopted 
Facebook and Twitter names that paid homage 
to slain friends, and she vowed vengeance on 
their killers. 

Barnes started hanging around with a local 
street crew in her early teens. In early 2011, a 
friend, Shondale “Tooka’ Gregory, was killed by 
gunfire while waiting for a bus. He was 15 years 
old. The crew started to refer to its territory as 
“Tookaville’ in memoriam, and Barnes began 
using “Tookaville’kirah’ as her Facebook name. 

Later that year, when a member of a rival 
gang, 20-year-old Odee Perry, was killed just 
blocks away, Internet chatter suggested that 
Barnes was the shooter. She neither confirmed 
nor denied the speculation, which probably 
enhanced her online mystique. 

Inasurvey of young black people in Chicago, 
nearly half reported that they had witnessed a 
gang-related killing*. Such experiences are 
especially detrimental to the adolescent brain, 
which is still developing, says Karen Sheehan, 
a paediatric emergency-room physician and 
professor at Lurie Children’s Hospital and 
Northwestern University’s Feinberg School of 


Medicine in Chicago. It affects the frontal lobe, 
she says, and diminishes the capacity to make 
good decisions. And the stress of grieving can 
exacerbate these deficiencies. 

In 2012, Barnes experienced another tough 
loss. She witnessed first-hand as Tyquan Tyler, 
a 13-year-old boy and close friend, was killed 
by a stray bullet at a neighbourhood party. 
Barnes changed her Twitter username to 
“TyquanAssassin’ in memoriam, and her activ- 
ity on the platform increased considerably. 

Patton was most interested in the posts 
Barnes made in the days leading up to her own 
death. On 28 March 2014, 19-year-old Raason 
‘Lil B’ Shaw, a member of her crew, was killed 
by Chicago police after he allegedly pointed a 
handgun at them during a foot chase. It was 
on the same street where Tyler had died, and 
Barnes was soon expressing her profound 
grief online. She renamed her twitter profile 
‘No Surrender Lil B’ and tweeted darkly, pres- 
ciently, “In da end we DIE? on 10 April. The 
very next day, Barnes was dead. 


“OUR RESEARCH 
HOPES TO SHOW 
THERE ARE 
PATHWAYS TO 
VIOLENCE.” 


GRIEF AND LOSS 

The deaths of Shaw and Barnes helped motivate 
Patton to develop SAFElab, a research initiative 
that explores how urban youth of colour navi- 
gate their lives and express violence across social 
media. Patton and his team looked closely at 
2,256 tweets posted in the weeks surrounding 
the killings from 235 unique users, all interact- 
ing with Barnes and her network’. Patton’s goal 
was to sort tweets into categories. There were 
those expressing loss — including expressions 
of grief or sadness and descriptions of trauma, 
death or incarceration. And there were others 
signalling aggression — insults and threats of 
violence or the desire for revenge. Tweets that 
did not include features of aggression or loss 
were assigned to a catch-all category. 

What the researchers found was a regular 
pattern of grief leading to aggression, often as 
rivals taunted those in pain or used language or 
imagery meant to minimize the loss ofa friend 
or insult the dead. “They're going through a 
grieving process, and part of that process is 
anger and disbelief; says Patton. “Rival crews 
or gangs are interrupting the process. That 
heightens the conversation over time.” 

For example, one member of Barnes’s crew 
tweeted shortly after Shaw’s death, “Opps in 
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trouble, we want blood.” Shaw was killed by 
police officers, but his fellow crew members 
were enraged and wanted to take out their 
anger on an opposition, or ‘opps, crew. A rival 
tweeted back, “come on our block mfs we want 
dis shit.” (Original postings have been altered 
by SAFElab to protect the identity of the users.) 

Patton’s team found that expressions of 
grief significantly correlated with subsequent 
aggressive tweets, which happened about two 
days later, on average. Those two days could 
be a crucial window for a virtual or clinical 
intervention by counsellors, social workers or 
mental-health professionals, the researchers 
think. But combing through masses of Twit- 
ter traffic to detect and identify such signs of 
danger would be impossible. So Patton asked 
Columbia's Department of Computer Sciences 
to explore automating the search. The challenge 
caught the attention of Kathleen McKeown, a 
natural-language-processing specialist and 
founding director of the Data Science Institute 
at Columbia University in New York City. 

Internet banging has its own distinctive 
lingua franca, which Patton describes as “a 
combination of African American vernacular 
English, social-media speak and brilliant uses 
of punctuation and numbers”. 

It was “very different than any language that 
Ihad ever worked on, says McKeown. To assist 
the computational interpretation, the team 
brought in violence specialists — individuals 
formerly involved with gangs — to annotate 
hundreds of tweets. Some of the translation 
was fairly obvious. A gun emoji and a devil 
face, for example, represent potential threats, 
and a phrase like “Got da Smithy on Me Right 
Na” would mean “I’m carrying a Smith & 
Wesson handgun.” Other phrases were more 
opaque. “Keta smoking and thinkin of Lil B,” 
was a tweet mourning the loss of Shaw, but also 
meant to disrespect a rival gang member who 
had been killed (see “Tweet translations). 

McKeown’s group created a tool that could 
translate words and phrases. “We developed a 
bilingual dictionary,’ she says. 

The first test runs of the tool accurately iden- 
tified expressions of loss and grief 62% of the 
time, says Patton. By the time they published 
the research, this measure had improved. “We'll 
never be at 100% because social media is always 
changing and the language is often new. But 
this is very high for data science,’ Patton says. 

They expanded the data set to include about 
2 million tweets among 9,000 users, and the 
team is now training an additional AI applica- 
tion on the images that accompany posts. The 
researchers want to develop an AI application 
that can accurately identify grief and sadness 
even from facial expressions. 

“The dominant narrative is that violence is a 
one-time incident like a finite moment in time. 
Our research hopes to show there are pathways 
to violence,’ says William R. Frey, a doctoral 
student who is mentored by Patton and is the 
coordinator of SAFElab. 

Research suggests that grief and grievances 
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TWEET TRANSLATIONS 


SAFElab works with community organizations and people formerly involved with gangs to evaluate 
and contextualize tweets and images posted by young people in Chicago, Illinois. 


ANNOTATION AND LABELLING 


The team marked up thousands of tweets and images and then labelled them as pertaining to loss, 


aggression or other themes. 


RIP (rest in peace) and BIP (ball in paradise) 
comes up commonly in tweets about loss. 


The team added boxes around parts of images 
pertaining to drugs, firearms, gang affiliations 
(such as hand gestures and tattoos) and more. 


XXXXX XXXXX 
@XXXXXXXXXX 


: | 


© woke up thinkin yu was gon be here, 


2:30am 


Ouvnuodvos 


Names provide clear links 
to real events and people. 


XXXKK XXXXK 
@XXXXXXXXXX 


im finna|rollup|JROY right na! 
3:15am 


YuUO8 


People involved in gangs sometimes name 
the drugs they use after dead rivals; to 
‘smoke’ them is a sign of disrespect. 


MACHINE LEARNING 


Emojis can provide additional context for 
things such as aggression (devil face) and 
drug use (blowing smoke). 


Labelled tweets are used to train natural-language-processing and computer-vision models, which sort 
novel tweets into categories such as loss and aggression. 


oz © 


Aggression 


are data points along the road to violence, says 
Gene Deisinger, a psychologist, retired police 
officer and threat-management expert who has 
advised organizations and government agen- 
cies on how to identify and manage risks of 
violent behaviour. “We've learned that when a 
person fixates [on] the need for a violent reso- 
lution, that does increase the risk of violence.” 
Deisinger sees some similarities between 
Patton’s research and threat-assessment activ- 
ities. “You can actually develop some robust 
models of prediction of human behaviour such 
as what proportion of the group will escalate 
to violent behaviour. It still begs the question 
of what any individual will do,’ says Deisinger. 
He cautions against using group outcomes and 
predictions, particularly where taking action 
might interfere with constitutional rights. 


EARLY INTERVENTIONS 

Chicago has proved an ideal setting for Patton's 
experiments; the city has become a labora- 
tory for strategies aiming to prevent injuries 
and deaths from gunshots. Cure Violence, a 
programme founded by epidemiologist Gary 
Slutkin at the University of Illinois at Chicago, 
aims to treat gun violence as ifit were an infec- 
tious disease. It monitors activity and inter- 
venes to interrupt the spread of gun violence, 
treats high-risk individuals and educates the 


Grief or loss 


A ®@ 


Other (such as substance use) 


community about prevention. READI Chicago 
offers employment opportunities, cognitive 
behavioural therapy and support services for 
young men at highest risk of violence. 

One initiative that began in 2016 is the 
Institute for Nonviolence Chicago (INV), 
which was founded by Teny Gross, formerly 
of the Israel Defense Forces. He is aiming to 
partner with SAFElab to conduct field research 
on some of the intervention tools that Patton's 
group is creating. 

INV has a hyper-local approach that organ- 
izes the community around the non-violent 
principles advocated by Martin Luther King Jr. 
Its 25 outreach workers follow chatter on social 
media, but they have a limited reach, says David 
Cassel, INV’s director of strategy and organiza- 
tional advancement. “They see conflicts hap- 
pening in social media and then see the actual 
gun violence in the community. But it’s very 
difficult for them to monitor social media and 
do their jobs,” says Cassel. The organization 
hopes to get funding to test strategies for using 
SAFElab’s tools, possibly to send automatic 
alerts to case workers, who could then contact 
people at risk of carrying out retaliatory attacks. 

Patton’s SAFElab project joins a growing 
body of research that uses AJ and social data to 
predict public-health outcomes. An application 
called nEmesis, for example, developed by the 
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University of Rochester in New York, also uses = 
Al and natural-language processing to search # 
Twitter. This program looks for tweets about £ 
food poisoning to help identify the source of an 
outbreak for public-health investigators. 

The University of California Institute for 
Prediction Technology, based in Los Angeles, 
has been working on another problem. Scien- 
tists have used a statistical model and machine 
learning to predict heroin overdoses on the 
basis of Google searches for prescription and 
non-prescription opioids’. 

The study’s lead author, Sean Young, says 
that Patton's research is “very promising’, but 
he suggests there are some limitations. “How 
do we get a sense of the validity of the infor- 
mation?” he asks. How does one separate real 
threats from boastful swagger? 

Patton acknowledges there is a degree of 
showmanship on social media. “People who 
live in communities impacted by violence may 
present themselves as being in a gang, but they 
are simply acting or performing,” he says. 

As SAFElab and INV look to secure fund- 
ing for developing and testing tools for the real 
world, Patton and his colleagues hope that they 
can learn more from the short life of Gakirah 
Barnes and others like her. They have several 
new projects, he says, “including expanding 
Twitter analysis beyond aggression and grief 
to look at other factors associated with youth 
gun violence, such as substance use and men- 
tal health”. Members of the team are now using 
network analysis to look at the links between 
online chatter and real-world behaviour. They 
are also experimenting with virtual reality to 
teach young people how to navigate social 
media and limit their exposure to violence. 

Patton worries that some people might try to 
use social-media data in a discriminatory way 
by improperly trying to predict problems and 
even police activity in marginalized communi- 
ties. But he sees many more positive aspects to 
the work: “I think the power of social media is 
that it gives you depth, vulnerability and multiple 
perspectives.” He sees young people relating to 
each other and seeking out comfort and support. 
Strengthening positive connections in the com- 
munity could be important for preventing the 
kind of violence that has ended so many young 
lives. He also hopes to broaden the discussion 
beyond violence. “With Gakirah, of course, we 
found a young woman who loved, who hurt, 
who was excited and who was in pain. That’s a 
normal person, not just a gang member.” = 


PHOTO: 


Rod McCullom is a science journalist in 
Chicago, Illinois. 
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THE QUEST TO CONQUER 


THE SPACE JUNK PROBLEM 


Zombie satellites, rocket shards and collision debris are creating 
major traffic risks in orbits around Earth. Researchers are working to 
reduce the threats posed by more than 20,000 objects in space. 


BY ALEXANDRA WITZE 


BUSY SKIES 


There are currently more than 20,000 
objects in orbit around Earth, according to 
catalogues that track operational satellites, 5 : 
dead ones and other human-made debris, pec: “ : : i A visualization by NASA 
such as pieces from rockets. And the problem . et : depicts the traffic of objects 
is growing quickly: more than 1,800 new ped aes : 2 in orbits around Earth. 
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SOURCE: ESA ANNUAL SPACE ENVIRONMENT REPORT. EARTH DEBRIS IMAGE: NASA GODDARD SPACE FLIGHT CENTER/JSC 


n Monday 2 July, the CryoSat-2 spacecraft was 
orbiting as usual, just over 700 kilometres above 
Earth’s surface. But that day, mission controllers 
at the European Space Agency (ESA) realized 
they had a problem: a piece of space debris was 
hurtling uncontrollably towards the €140-million 
(US$162-million) satellite, which monitors ice on the planet. 

As engineers tracked the paths of both objects, the chances of a colli- 
sion slowly increased — forcing mission controllers to take action. On 
9 July, ESA fired the thrusters on CryoSat-2 to boost it into a higher orbit. 
Just 50 minutes later, the debris rocketed past at 4.1 kilometres a second. 

This kind of manoeuvre is becoming much more common each year, 
as space around Earth grows increasingly congested. In 2017, commercial 
companies, military and civil departments and amateurs lofted more than 
400 satellites into orbit, over 4 times the yearly average for 2000-2010. 
Numbers could rise even more sharply if companies such as Boeing, One- 
Web and SpaceX follow through on plans to deploy hundreds to thou- 
sands of communications satellites into space in the next few years. If all 
these proposed ‘megaconstellations’ go up, they will roughly equal the 
number of satellites that humanity has launched in 
the history of spaceflight. 

All that traffic can lead to disaster. In 2009, a 
US commercial Iridium satellite smashed into an 
inactive Russian communications satellite called 
Cosmos-2251, creating thousands of new pieces 
of space shrapnel that now threaten other satel- 
lites in low Earth orbit — the zone stretching up to 
2,000 kilometres in altitude. Altogether, there are 
roughly 20,000 human-made objects in orbit, from 
working satellites to small shards of solar panels 
and rocket pieces. And satellite operators can't steer 
away from all potential collisions, because each 
move consumes time and fuel that could otherwise 
be used for the spacecraft’s main job. 

Concern about space junk goes back to the beginning of the satellite 
era, but the number of objects in orbit is rising so rapidly that research- 
ers are investigating new ways of attacking the problem. Several teams 
are trying to improve methods for assessing what is in orbit, so that satel- 
lite operators can work more efficiently in ever-more-crowded space. 
Some researchers are now starting to compile a massive data set that 
includes the best possible information on where everything is in orbit. 
Others are developing taxonomies of space junk — working out how 
to measure properties such as the shape and size of an object, so that 
satellite operators know how much to worry about what’s coming their 
way. And several investigators are identifying special orbits that satellites 
could be moved into after they finish their missions so they burn up in 
the atmosphere quickly, helping to clean up space. 

The alternative, many say, is unthinkable. Just a few uncontrolled 
space crashes could generate enough debris to set off a runaway cascade 
of fragments, rendering near-Earth space unusable. “If we go on like 
this, we will reach a point of no return,” says Carolin Frueh, an astro- 
dynamical researcher at Purdue University in West Lafayette, Indiana. 


DIRTYING ORBITS 

Astronomers and others have worried about space junk since the 1960s, 
when they argued against a US military project that would send millions 
of small copper needles into orbit. The needles were meant to enable 
radio communications if high-altitude nuclear testing were to wipe out 
the ionosphere, the atmospheric layer that reflects radio waves over long 
distances. The Air Force sent the needles into orbit in 1963, where they 
successfully formed a reflective belt. Most of the needles fell naturally 
out of orbit over the next three years, but concern over ‘dirtying’ space 
nevertheless helped to end the project. 

It was one of the first examples of the public viewing space as a land- 
scape that should be kept clean, says Lisa Rand, a historian of science in 
Philadelphia, Pennsylvania, and a fellow with the American Historical 
Association and NASA. 


“IF WE GO ON 
LIKE THIS, WE 
WILL REACH A 

POINT OF NO 

RETURN.” 
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Since the Soviet Union launched the first satellite, Sputnik, in 1957, 
the number of objects in space has surged, reaching roughly 2,000 in 
1970, about 7,500 in 2000 and about 20,000 known items today. The 
two biggest spikes in orbital debris came in 2007, when the Chinese 
government blew up one of its satellites in a missile test, and in the 
2009 Iridium-—Cosmos collision. Both events generated thousands of 
fresh fragments, and they account for about half of the 20-plus satellite 
maneouvres that ESA conducts each year, says Holger Krag, head of 
ESA’s space-debris office in Darmstadt, Germany. 

Each day, the US military issues an average of 21 warnings of potential 
space collisions. Those numbers are likely to rise dramatically next year, 
when the Air Force switches on a powerful new radar facility on Kwajalein 
in the Pacific Ocean. That facility will allow the US military to detect 
objects smaller than today’s 10-centimetre limit for low Earth orbit, and 
this could increase the number of tracked objects by a factor of five. 

Even as our ability to monitor space objects increases, so too does the 
total number of items in orbit. That means companies, governments and 
other players in space are having to collaborate in new ways to avoid a 
shared threat. Since the 2000s, international groups such as the Inter- 
Agency Space Debris Coordination Committee 
have developed guidelines for achieving space sus- 
tainability. Those include inactivating satellites at 
the end of their useful lifetimes by venting leftover 
fuel or other pressurized materials that could lead 
to explosions. The intergovernmental groups also 
recommend lowering satellites deep enough into the 
atmosphere that they will burn up or disintegrate 
within 25 years. 

But so far, only about half of all missions have 
abided by this 25-year guideline, says Krag. Opera- 
tors of the planned megaconstellations say they will 
be responsible stewards of space, but Krag worries 
that the problem could increase, despite their best 
intentions. “What happens to those that fail or go 
bankrupt?” he asks. “They are probably not going to spend money to 
remove their satellites from space.” 


TRAFFIC COPS FOR SPACE 

In theory, satellite operators should have plenty of room for all these 
missions to fly safely without ever nearing another object. So some 
scientists are tackling the problem of space junk by trying to understand 
where all the debris is to a high degree of precision. That would alleviate 
the need for many unnecessary manoeuvres that today are used to avoid 
potential collisions. “If you knew exactly where everything was, you would 
almost never have a problem,” says Marlon Sorge, a space-debris specialist 
at the Aerospace Corporation in El Segundo, California. 

The field is called space-traffic management, because it's analogous to 
managing traffic on the roads or in the air. Think about a busy day at an 
airport, says Moriba Jah, an astrodynamicist at the University of Texas at 
Austin: planes line up in the sky like a string of pearls, landing and taking 
off close to one another in a carefully choreographed routine. Air-traffic 
controllers know the location of the planes down to 1 metre in accuracy. 

The same can't be said for space debris. Not all objects in orbit are 
known, and even those included in databases are tracked to varying 
levels of precision. On top of that, there is no authoritative catalogue 
that accurately lists the orbits of all known space debris. 

Jah illustrates this with a web-based database that he developed, called 
ASTRIJAGraph. It draws on several sources, such as catalogues maintained 
by the US and Russian goverments, to visualize the locations of objects 
in space. When he types in an identifier for a particular space object, 
ASTRIAGraph draws a purple line to designate its orbit. 

Only this doesn’t quite work for a number of objects, such as 
a Russian rocket body launched in 2007 and designated in the 
database as object number 32280. When Jah enters that number, 
ASTRIAGraph draws two purple lines: the US and Russian sources 
contain two completely different orbits for the same object. Jah says 
that it is almost impossible to tell which is correct, unless a third 
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source of information could help to cross-correlate the correct location. 

ASTRIAGraph currently contains some, but not all, of the major 
sources of information about tracking space objects. The US military 
catalogue — the largest such database publicly available — almost 
certainly omits information on classified satellites. The Russian 
government similarly holds many of its data close. Several commercial 
space-tracking databases have sprung up in the past few years, and most 
of those do not share openly. 

Jah describes himself as a space environmentalist: “I want to make 
space a place that is safe to operate, that is free and useful for future 
generations.” Until that happens, he argues, the space community will 
continue devolving into a tragedy of the commons, in which all space- 
flight operators are polluting a common resource. 

He and other space environ- 
mentalists are starting to make 
headway, at least when it comes 
to US space policy. Jah testified 
on space-traffic management in 
front of Congress last year, at the 
invitation of Ted Cruz, a Repub- 
lican senator from Texas who co- 
introduced a space-regulations bill 
this July. In June, President Donald 
Trump also signed a directive on 
space policy that, among other 
things, would shift responsibil- 
ity for the US public space-debris 
catalogue from the military to a 
civilian agency — probably the 
Department of Commerce, which 
regulates business. 

The space-policy directive is a 
rare opportunity to discuss space 
junk at the highest levels of the US 
government. “This is the first time we're really having this conversation 
ina serious fashion,” says Mike Gold, vice-president for regulatory, pol- 
icy and government contracts at Maxar Technologies of Westminster, 
Colorado, which owns and operates a number of satellites. 


THE ORBITING DEAD 

The space around Earth is filled with zombies: some 95% of all objects in 
orbit are dead satellites or pieces of inactive ones. When someone oper- 
ating an active satellite gets an alert about an object on a collision course, 
it would be helpful to know how dangerous that incoming debris is. 
“With more and more objects, and the uncertainties we currently have, 
you just get collision warnings no end,’ says Frueh. (Micrometeorites 
represent a separate threat and can't be tracked at all.) 

To assess the risk of an impending collision, satellite operators need to 
know what the object is, but tracking catalogues have little information 
about many items. In those cases, the military and other space trackers use 
telescopes to gather clues in the short period before a potential collision. 

Working with the Air Force, Frueh and her colleagues are developing 
methods to rapidly decipher details of orbiting objects even when very 
little is known about them. By studying how an object reflects sunlight 
as it passes overhead, for instance, she can deduce whether it is tum- 
bling or stable — a clue to whether or not it is operational. Her team 
is also experimenting with a machine-learning algorithm that could 
speed up the process of characterizing items, work she will describe on 
14 September at a space-tracking meeting in Maui, Hawaii. 

Once researchers know what an orbiting object is made of, they have 
a number of potential ways to reduce its threat. Some sci-fi-tinged 
proposals involve using magnets to sweep up space junk, or lasers to 
obliterate or deflect debris in orbit. In the coming weeks, researchers 
at the University of Surrey in Guildford, UK, will experiment with a 
net to ensnare a test satellite. The project, called RemoveDEBRIS, will 
then redirect the satellite into an orbit that will re-enter the atmosphere. 

But such active approaches to cleaning up space junk aren't likely to be 


26 | NATURE | VOL 561 | 6 SEPTEMBER 2018 


Tiny CubeSats are released from the International Space Station in 2012. 


practical over the long term, given the huge number of objects in orbit. So 
some other experts consider the best way of mitigating space junk to bea 
passive approach. This takes advantage of the gravitational pulls of the Sun 
and the Moon, known as resonances, that can put the satellites on a path 
to destruction. At the University of Arizona in Tucson, astrodynamicist 
Aaron Rosengren is developing ways to do so. 

Rosengren first came across the idea when studying the fates of satel- 
lites in medium Earth orbit (MEO). These travel at altitudes anywhere 
between about 2,000 kilometres up, where low Earth orbit ends, and 
35,000 kilometres up, where geostationary orbits begin. 

Satellites in low Earth orbit can be disposed of by forcing them to 
re-enter the atmosphere, and most satellites in the less heavily traf- 
ficked geostationary region can be safely placed in ‘graveyard’ orbits 
that never interact with other 
objects. But in MEO, satellite tra- 
jectories can be unstable over the 
long term because of gravitational 
resonances. 

An early hint that spacecraft 
operators could harness this 
phenomenon came from ESA’s 
INTEGRAL y-ray space tele- 
scope, which launched in 2002. 
INTEGRAL travels in a stretched- 
out orbit that spans all the way 
from low Earth orbit, through 
MEO, and into geostationary orbit. 
It would normally have remained 
in space for more than a century, 
but in 2015, ESA decided to tweak 
its orbit. With a few small thruster 
burns, mission controllers placed 
it on a path to interact with gravi- 
tational resonances. It will now 
re-enter the atmosphere in 2029, rather than decades later. 

In 2016, Rosengren and his colleagues in France and Italy showed 
that there is a dense web of orbital resonances that dictates how objects 
behave in MEO (J. Daquin et al. Celest. Mech. Dyn. Astr. 124, 335-366; 
2016). Rosengren thinks this might offer a potential solution. There are 
paths in this web of resonances that lead not to MEO, but directly into 
the atmosphere, and operators could take advantage of them to send sat- 
ellites straight to their doom. “We call it passive disposal through reso- 
nances and instabilities,” says Rosengren. “Yeah, we need a new name.” 

Other researchers have explored the concept before, but Rosengren 
is trying to push it into the mainstream. “It’s one of the newer things in 
space debris,” he says. 

These disposal highways in the sky could be easy to access. At a space 
conference in July in Pasadena, California, Rosengren and his colleagues 
reported on their analysis of US Orbiting Geophysical Observatory satel- 
lites from the 1960s. The scientists found that changing the launch date or 
time by as little as 15 minutes could lead to huge differences in how longa 
satellite remains in orbit. Such information could be used to help calculate 
the best times to depart the launch pad. 

Being proactive now could head off a lot of trouble down the road, as 
operators of satellites such as CryoSat-2 have found. When ESA decided 
to take evasive action in early July, its engineers had to scramble and work 
through the weekend to get ready for the manoeuvre. Once the space junk 
had safely flown by, CryoSat-2 took a few days to get back into its normal 
orbit, says Vitali Braun, a space-debris engineer with ESA. 

But the alerts didn’t stop coming. In the weeks that followed, mis- 
sion controllers had to shift various satellites at least six times to dodge 
debris. And on 23 August, they nudged the Sentinel-3B satellite out 
of the way of space junk for the first time. It had been in orbit for only 
four months. = 


Alexandra Witze is a correspondent for Nature based in Boulder, 
Colorado. 
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Smog envelopes Santiago in Chile. 


Five steps to improve 
air-quality forecasts 


A worldwide monitoring and modelling network would reduce the dramatic toll of 
air pollution on health and food production, urge Rajesh Kumar and colleagues. 


the effects of air pollution. More than 

90% of such deaths are in developing 
countries’. Across southern Asia, levels of 
fine particulate matter (PM, .) and surface 
ozone exceed the World Health Organization 
(WHO) limits for much of the year’. Ozone 
damage to crops and plants — especially to 
soya beans, wheat and maize (corn) — results 
in 79 million to 121 million tonnes of lost 
produce globally, at a cost of US$11 billion to 
$18 billion®. India’s crop losses alone would 
feed 94 million people’. All this costs the 


S even million people die every year from 


world’s economy US$5 trillion per year’. 

But air pollution often goes unmonitored. 
Some of the fastest-growing cities in Africa, 
including Lagos, Kinshasa, Abidjan and 
Dakar, have no air-quality alert systems. Gov- 
ernments can be reluctant to acknowledge the 
problem, or lack the tools to address it. There 
is no international strategy for dealing with 
the issue. And few people are trained in how 
to collect and interpret air-quality data. 

Improvements can take decades. It took the 
US city of Atlanta, Georgia, 15 years to reduce 
emissions from power plants by around 80% 


and from traffic by up to 90%, avoiding more 
than 50,000 hospital visits for asthma and 
lung diseases (ref. 6). Los Angeles in Califor- 
nia took 50 years to reduce ozone levels by 
two-thirds’. 

Forecasts of hazardous air pollution are 
crucial to help reduce exposure. Vulnerable 
people can avoid strenuous outdoor activi- 
ties or stay indoors. Schools might restrict 
outdoor sports activities, parents can limit 
the time their children spend outdoors and 
doctors might advise their patients to stay 
inside when levels are high. In Canada, 
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> for example, daily forecasts have helped 
to decrease the number of asthma-related 
emergency-department visits by 25% (ref. 8). 

City managers can also take steps to limit 
pollution. For example, Santiago in Chile 
restricts driving and certain industries dur- 
ing predicted periods of high pollution. These 
measures have reduced the PM, ; concentra- 
tions by 20% and avoided around 8 deaths 
each day’. However, efforts to ban cars on 
polluted days in Delhi have had little effect’®. 

Many parts of North America and 
Europe provide daily broadcasts of air- 
quality forecasts. But across much of Asia, 
Africa and South America, smog still 
arrives unannounced. Predictions require 
advanced computer models and regional 
weather forecasts; both are lacking across 
the developing world. 

Access to such systems needs to be made 
global. Local air quality is affected by the 
intercontinental transport of air pollutants 
and distant weather events such as the El Nifio 
Southern Oscillation (ENSO). Denser sam- 
pling of pollution data would tell air-quality 
managers which activities are most danger- 
ous and where the worst pollution is coming 
from — road traffic or industry, say. 

Most forecasts of air quality cover two 
to five days. Extending this to seven to ten 
days — similar to alert periods for hurri- 
canes and floods — would give communities 
and hospitals more time to prepare. Seasonal 
forecasts would give farmers the chance to 
shift their planting and harvesting dates and 
choose resistant crops. Such extensions, as 
well as predictions that span seasons, will 
be helped by the ongoing efforts to improve 
weather forecasts and predict emissions 
from wildfires. 

Rolling out a global forecasting system 
for air quality will take five advances: 
expanding observing networks; improv- 
ing models; devising metrics and tools 
for quantifying air pollution; disseminat- 
ing the information; and training experts. 
These steps align with the priorities for 
weather and climate set out by the World 
Meteorological Organization (WMO)". 

We call on the WMO, the WHO, the 
United Nations Environment Programme 
(UNEP) and the Food and Agriculture 
Organization of the UN (FAO) to lead the 
development of an international programme 
for air-quality monitoring and prediction. 
Financial institutions such as the World 
Bank and non-governmental organizations 
should support air-quality initiatives. 


FIVE STEPS 

Monitoring. Levels of ozone, PM, ;, carbon 
monoxide, nitrogen oxides, sulfur diox- 
ide, aerosols and other pollutants need 
to be tracked globally, at least daily and 
ideally hourly. Satellite, aircraft, balloon 
and ground-based instruments will all be 
needed. Upcoming geostationary missions 


such as the US Tropospheric Emissions: 
Monitoring of Pollutants; the Korean Aero- 
space Research Institute (KARI) Geostation- 
ary Environmental Monitoring Spectrometer; 
and the European Copernicus Programme's 
Sentinel-4 instruments will soon be able to 
track pollutants continuously over North 
America, Asia and Europe. Governments 
should launch more such satellites or drones 
to cover other continents. The costs (of 
$50 million to $100 million) are small rela- 
tive to the economic losses from air pollution. 

On the ground, air-quality monitoring 
networks need to be established in Africa, 
South America and southeast Asia. With 
about 1 station per urban neighbourhood 
and around 100 sta- 


tions providing a “Rolling 
baseline outside outaglobal 
cities, these could forecasting 
complement the system for air 
hundreds of Global quality will take 
Earth Observatory five advances.” 


stations that have 

been proposed for tracking many environ- 
mental compounds”. Local scientists will 
need to be trained to gather the data, with 
support from governments and multilateral 
exchange programmes for researchers. 

All countries should share their air-quality 
information. Some do already; many do 
not. There is no international authority 
that organizes this. The WMO facilitates 
such sharing for weather data, and it should 
coordinate an international agreement and 
devise quality controls and operating guide- 
lines for air quality, too. Metadata should be 
included (such as the height of a detector 
above ground, or its distance from the near- 
est building, tree, road or industrial stack) as 
well as measurement errors. 


Modelling. Researchers must improve 
regional and urban air-quality prediction 


Schoolchildren walk through smog in New Delhi. 
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models. These combine weather and 
atmospheric composition, emissions records 
and ground and space observations. The 
models should pinpoint major sources 
of pollution in cities, such as buildings or 
roads. And they should track daily cycles, 
for example in pollution from traffic, as well 
as decadal changes in weather patterns and 
emissions. Natural emissions, including 
those from wildfires, must be accounted for. 

Uncertainties must be quantified to 
avoid false alarms or missed episodes. 
Inventories that collate emissions from 
various sources (such as residential and 
industrial buildings, power, traffic, agri- 
culture and forest fires) are a major source 
of error, especially in developing countries 
where reporting is sparse. Such data are 
often years out of date. Biased measure- 
ments and poorly understood physical 
and chemical processes (such as organic 
chemistry in the atmosphere, aerosol 
microphysics and deposition mechanisms) 
also introduce uncertainties. 

Many developing countries do not have 
the computing power to model air quality. 
This capability should be added to existing 
weather-forecasting systems. Large institu- 
tions with advanced supercomputers should 
run the global forecasts, such as the Euro- 
pean Centre for Medium-Range Weather 
Forecasts (ECMWE) and the US National 
Center for Atmospheric Research (NCAR). 
National meteorological, air-quality and 
environmental services must tailor models 
to local needs. Cloud computing can 
broaden access, and the ECMWF is trialling 
this for its weather forecasts. 


Interpretation. Researchers need to 
understand the processes that control air pol- 
lution in particular places and over the longer 
term. They should develop tools to help 
policymakers and city managers evaluate 
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An air-pollution monitoring device in Nepal. 


the impacts of regulations, economic policies 
and urbanization on air quality. 

Broad collaborations will be needed. For 
example, the Partnership with China on 
Space Data (PANDA) project is bringing 
together air-quality scientists and officials 
from Europe and China to develop warning 
systems for 28 Chinese cities. Similar part- 
nerships are springing up in South America 
and India, but still need to be developed for 
Africa and southeast Asia. 


Dissemination. National meteorological, 
air-quality and environmental service 
providers should calculate health and 
air-quality indices based on pollutant 
concentrations, similar to those used in 
North America and Europe. For instance, 
the Air Quality Index published by the US 
Environmental Protection Agency reports 
how polluted the air is (on a scale of 0-500) 
and the health effects (such as unhealthy 
for sensitive groups, or hazardous for eve- 
ryone). These indices should be explained 
and released to the public and to hospitals 
through websites, mobile messaging and 
so on. London, UK, and Riga, Latvia, issue 
text alerts on air quality, for example. Air 
quality could be added to other alert apps, 
such as one indicating risks from lightning 
strikes in Andhra Pradesh, India. Technol- 
ogy companies are showing more interest 
in air-quality monitoring and forecasting. 
Google, for instance, has been mapping air 
quality since 2014 with its street-view cars, 


and IBM has launched a global initiative 
called Green Horizons, in which air-quality 
forecasting is a key component. 


Training and education. Students, 
environmental engineers and early-career 
scientists in developing countries need 
training in how to measure air quality and 
climate change and interpret the results. 
Exchanges of scientists between developed 
and developing countries could be arranged 
through programmes such as the European 
Commission’s Marie Sklodowska-Curie 
Actions. Massive open online courses are a 
great opportunity to gain detailed first hand 
insight. In October, for instance, the Euro- 
pean Organisation for the Exploitation of 
Meteorological Satellites and the ECMWF 
are running a course on the free platform 
Iversity, called Monitoring Air Quality 
from Space, which will explain more about 
the data being gathered and delivered in 
the Copernicus Programme. The public 
and authorities need to understand how to 
change behaviours, for example by avoiding 
driving when pollution is bad. Incentives 
will be needed, such as subsidies for people 
using public transport. 

As a first step, a summit sponsored by the 
WMO, WHO, UNEP and FAO could begin 
to design a global strategy for reducing 
deaths caused by air pollution. = 


Rajesh Kumar is a project scientist at 
the National Center for Atmospheric 
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Copernicus Atmosphere Monitoring 
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for Medium-Range Weather Forecasts in 
Reading, UK. James H. Crawford is NASA’ 
senior scientist for tropospheric chemistry at 
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A waterfall in Grand Canyon National Park, Arizona. 


MATERIALS SCIENCE 


Wine, water, oil and spit 


Derek Lowe enjoys Mark Miodownik’s sparkling journey through liquids. 


e humans can’t help but have 
our views of reality skewed by 
our own experience. We take 


the physical conditions around us as the 
normal state of affairs and regard others as 
extreme. The chemist’s ‘standard tempera- 
ture and pressure’ of 0°C under 10° pascals 
makes perfect sense as a reference in the 
human frame, but is quite unusual in the 
larger scheme of things. The rest of the Solar 
System, for example, often features tempera- 
tures and pressures much higher or much 
lower; the rest of (mostly empty) outer space 
is even worse. It’s the surface of Earth that is 
the outlier. 

These uncommon conditions mean 
we experience something very rare in the 
higher and lower expanses of temperature 
and pressure: a wide variety of liquids. It is 
this odd realm that materials scientist Mark 
Miodownik explores in Liquid, his enjoyable 
successor to his 2013 paean to materials, 
Stuff Matters. 


The book explores 
the histories, struc- 
tures and properties 
of many different 
sorts of liquid, with 
excursions into the 
larger topics each 
brings up. Miodownik 
organizes his narrative 
around the conceit of 


an aeroplane journe Bure: Tite 
: pemie ean Delightful and 

with various inci- Dangerous 

dents and episodes Substances That 

setting off trains Flow Through Our 

of thought. Alco- Lives 

hol makes an early A" MIODOWNIK 
Viking (2018) 


appearance by way 
of the drinks trolley. 
Others are introduced through ocean waves 
observed below, the soap dispenser in the 
lavatory, the thought of refrigerants in the 
air-conditioning system, and so on. The 
framework is reasonably effective, although 
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it gets a bit wearing, and a few digressions 
are surprisingly lengthy. (Then again, it isa 
transatlantic flight.) 

The liquid state is a small part of most 
phase diagrams, a narrow range between 
a large solid zone and a large gaseous one. 
Conditions have to be just right for a sub- 
stance to condense out of the gas phase but 
not to firm up into some kind of solidified 
mass. And for many simple compounds, 
those liquid conditions manage to overlap 
with how we're used to seeing and handling 
them — which is why a plane journey can 
include so many useful examples. The one 
we all know best, of course, is water. The dis- 
covery of liquid water on another planet or 
moon, even far below the surface, somehow 
makes that world seem more real to us. But, 
as Miodownik explains, it’s one of the oddest 
liquids of all. 

H,O is strangely sticky and viscous for 
a molecule with such a small molecular 
weight. It has abnormally high melting 


JACK DYKINGA/NATUREPL.COM 


and boiling points compared with anything 
chemically similar, such as ammonia or 
hydrogen sulfide. When it freezes, its solid 
phase is actually less dense than its liquid 
one. That relatively rare characteristic leads 
to ice cubes and icebergs floating instead of 
sinking as any normal solid phase should 
do. (If there are sentient creatures some- 
where in the Universe living next to lakes 


Books in brief 


Five Photons 

James Geach REAKTION (2018) 

Light illuminates cosmic origins and decodes quotidian realities. But 
what is it? This deft primer by astrophysicist James Geach captures 
the elusive electromagnetic wave in five processes. His meditation 
on ‘old’ light takes us back to the singularity: the “cosmic seed” that 
expanded into the Big Bang. A study of starlight plunges us into 

the seething stellar surface. We peruse dark energy, radio waves 

and quasars — beacon-like galaxies in which supermassive black 


holes feed off interstellar gas and release vast amounts of energy. A 
masterclass in elucidating hard science with elegance and brevity. 


of superfluid liq- 
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great deal of inter- 
esting information 
in accurate, readable form. As a chemist, 
I find it a relief to read such an overview 
without being distracted by mischaracter- 
ized or oversimplified details. Solid-state 
physicists and materials scientists will also 
celebrate Miodownik’s excellent efforts at 
tying the everyday properties of liquids to 
their molecular structure. He provides many 
vivid examples: among these are saliva (dur- 
ing the flight’s meal service, naturally) and 
jet fuel, which he notes is not an explosive, 
but still has more chemical energy per unit 
volume than nitroglycerine. 

One of the things that chemistry and 
physics teach is that the information given 
by our senses is only a small part of the 
story: water is wet, our fingers can tell us 
that much. What we don't feel are the water 
molecules themselves, interacting with the 
protein surfaces of our skin. Their very 
atoms and electron clouds come within 
range of each other, attracting and repel- 
ling and adding up to the sensations that 
our vivid (but often crude) senses interpret. 
That hidden world underlies every object 
we see and handle. Liquid gives readers a 
sense of this — no small feat. 

And that brings up the question of 
who might read it. As with Stuff Matters, 
Miodownik is inspiring those in search 
of science in an accessible, entertaining 
format. Today, materials scientists are 
preparing exotic fluids packed with nano- 
particles that can turn them into magnets 
or optical sensors, and nanotechnologists 
and molecular biologists are exploring the 
behaviour of water and other liquids on 
very small scales. Liquid will come in very 
useful for people eager to understand these 
advances. 


Derek Lowe has worked in early-stage drug 
discovery for decades. His In the Pipeline is 
one of the longest-running science blogs. 
e-mail: derekb.lowe@gmail.com 


The Dinosaur Artist 

Paige Williams HACHETTE (2018) 

Who owns fossils? That vexed question lies at the heart of this 
exposé of the global trade in dinosaur remains — a messy meeting- 
place of commercial fossil collectors, palaeontologists, wealthy 
enthusiasts and natural-history museums. New Yorker staff writer 
Paige Williams’s packed account centres on former Mongolian 
president Tsakhiagiin Elbegdorj, US dinosaur hunter and restorer 
Eric Prokopi and a costly Tarbosaurus bataar fossil. An astonishing 
tangle of financial gain, national identity, scientific fervour and, 
above all, the obsessional need to possess pieces of the past. 


Through a Glass Brightly 

David PR Barash OXFORD UNIVERSITY PRESS (2018) 

As a species, we seem to be unable to shake off the idea of our 
exceptionalism. Yet science regularly trounces such ideas, argues 
evolutionary biologist David Barash in this briskly erudite study. 
Barash punctures human paradigms such as the ‘anthropic 
principle’, rationality and even selfhood, marshalling considerable 
research and considered reasoning as he goes. He concludes, rather 
splendidly, that the loss of such illusions flings open the door “to do 
something really extraordinary: to see ourselves as we really are” 
and use that knowledge to behave with more humanity. 


Vaquita: Science, Politics, and Crime in the Sea of Cortez 

Brooke Bessesen ISLAND (2018) 

The world’s smallest cetacean, the vaquita (Phocoena sinus), is also 
the most endangered marine mammal on the planet, found solely 

in northern Mexico’s Gulf of California. In this intrepid conservation 
detective story, marine biologist Brooke Bessesen deconstructs the 
species’ demise, showing how the tiny porpoises drown in gillnets 
used for poaching a prized black-market fish, Totoaba macdonaldi. As 
she shows, the effort to conserve remaining vaquitas is a torturously 
uncertain challenge — but ever driven by the idea, articulated by field 
biologist George Schaller, that “we cannot recover a lost world”. 


City Unseen: New Visions of an Urban Planet 

Karen Seto and Meredith Reba YALE UNIVERSITY PRESS (2018) 

Cities are a tug-of-war between nature and humanity — their 
configuration shaped by topography even as they mould the 
environment in and around them. This stunning study by 
Karen Seto and Meredith Reba explores this uneasy symbiosis 
through surreally hued satellite images of 100 cities. Snaps of 
Phoenix, Arizona, taken 31 years apart reveal serious urban 
sprawl, and a shot of grain fields around Semikarakorsk, Russia, 
is a controlled riot of colour and line with the verve of early 
modernist art. Barbara Kiser 
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STRUCTURAL BIOLOGY 


From ribosome to Royal Society 


Georgina Ferry enjoys Venki Ramakrishnan’s account of his road to the Nobel Prize. 


n 1968, the molecular biologist James 

Watson electrified the staid world of 

scientific biography by publishing The 
Double Helix, his memoir of the discovery of 
DNAs structure. Its irreverent style shocked 
and delighted readers in equal measure. And 
it changed how many think about science, by 
presenting it as a race in which winners got 
Nobel prizes and losers got nothing. 

Fifty years later, structural biologist, Nobel 
laureate and Royal Society president Venki 
Ramakrishnan tells the story of his own 
marathon. In Gene Machine, he thoughtfully 
embeds his trajectory in a wider meditation 
on how scientists make the decisions that lead 
to success or failure — and on how they strug- 
gle to solve complex problems. “Scientists will 
collaborate or compete depending on what is 
in their self-interest,” he writes, adding that 
competition can be “good for science, even if 
it ist so great for scientists” 

Ramakrishnan recognizes from the outset 
that, although most people today have at least 
a vague idea that DNA is the carrier of genetic 
information, the ribosome barely registers. 
Yet, without it, life as we know it would 
not have evolved. Every living cell contains 
millions of these complex structures, each 
consisting of a large and small subunit and 
incorporating a few dozen proteins and long 
strands of RNA. They are themselves protein 
factories. Each ribosome works along a mol- 
ecule of messenger RNA, reading its copy of 
the genetic instructions encoded in DNA. 
Guided by this, ribosomes assemble protein 
chains that then float off to do the work of 
building and running the body. 

This much was already known when 
Ramakrishnan began to work on the ribo- 
some in 1978 — although, as he writes, “we 
had no idea howit did even one of the many 
complicated steps involved in making a 
protein’. The race for the ribosome would 
reframe it as an intricate machine with 
numerous working parts — which you can 
now watch in action on YouTube. 

Indian-born, with academic parents, 
Ramakrishnan studied physics at the 
University of Baroda (now Vadodara) in 
Gujarat. In 1971, he went to Ohio University 
in Athens for graduate work in theoretical 
physics. By the time he had submitted his 
thesis, he had decided to switch to biology. 
Reading around a subject about which he 
knew almost nothing, he picked the ribo- 
some on the strength of a Scientific American 
article co-authored by biologist Peter Moore. 
Within a few years he was on the starting 
blocks, as a postdoc in Moore’s laboratory at 


Venki Ramakrishnan. 


Yale University in New Haven, Connecticut. 
It was clear that the only way to under- 
stand the ribosome was to solve its 3D 
structure at the resolution of individ- 
ual atoms. Ramakrishnan retrained in 
X-ray crystallography, then the preferred 
technique of structural biologists. 
Structural biologists face near-impossible 
challenges. These include making floppy, 
irregular proteins form crystals to bom- 
bard with X-rays; obtaining a diffraction 
pattern from the irradiated crystals before 
the structures disintegrate; and solving the 
ever-present ‘phase problem; the ambiguity 
ina diffraction pattern caused by peaks in 
an X-ray wave giving the same intensity as 
troughs. Eventually, software generates maps 
of electron density at high-enough resolu- 
tion to ‘see’ the individual atoms. For the big, 
complicated ribosome, many research teams 
in different countries worked on each of these 
problems successively or simultaneously. 
The acknowledged founder of the ribo- 
some field is Israeli crystallographer Ada 
Yonath. She, with colleagues at the Max 
Planck Institute for Molecular Genetics in 
Berlin, published the first successful crystal- 
lization of a ribosomal subunit in 1980. An 
unwritten rule in crystallography at the time 
was that once someone had a crystal, every- 
one else would leave it to them to progress 
to atomic resolution. Ramakrishnan writes 
that the ribosome was different because of 
its importance, and the fact that years had 
gone by “without much apparent progress 
towards an actual structure”. By 1995, four 
groups around the world were in competition, 
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and several others 
were contributing 


Gene Machine: The 
Race to Decipher 


the Secrets of the key techniques to 

ene RISHNAN a ane Line 
\KI RAMAKRISHN re 

Oneworld (2018) lization or locate 


atoms. 

In 1999, Rama- 
krishnan moved to the mecca of structural 
biology, the Medical Research Council Lab- 
oratory of Molecular Biology in Cambridge, 
UK. Scientists at this one lab had already 
garnered seven Nobel prizes. The institution 
has now racked up 11 — more than some sci- 
entifically advanced nations. Ramakrishnan 
focused on the smaller of the ribosome’s subu- 
nits, which decodes mRNA before the larger 
subunit assembles the protein. 

By 2000, he and his colleagues had solved 
its structure, finishing neck and neck with 
the team of Thomas Steitz at Yale, which 
had solved the large subunit. Previously 
seen as a rank outsider, Ramakrishnan was 
propelled into the limelight. A Nobel for the 
ribosome work was being openly discussed. 
There were up to six potential contenders 
for a prize that can be split only three ways. 
James Watson twice told Ramakrishnan that 
he shouldn't mind not being one of the three 
when the time came. Yet, in 2009, the time 
did come. Yonath, Ramakrishnan and Steitz 
shared that year’s Nobel Prize in Chemistry. 

Ramakrishnan credits his wife, the art- 
ist Vera Rosenberry, with keeping him 
grounded: on hearing of his prize, she said, “I 
thought you had to be really smart to win one 
of those!” He reflects on the disproportion- 
ate status such recognition brings. Suddenly, 
you're showered with other honours, and 
expected to pronounce sagely on everything 
from climate change to human cloning. 

Some readers might take issue with how 
events or personalities are presented in Gene 
Machine. Yonath’s pioneering work is fully 
acknowledged, for example. Yet, as Rama- 
krishnan’s principal competitor, she some- 
times appears in an unfavourable light. This 
is not an objective history of the field, but a 
highly personal account. As such, anyone 
who wants to know how modern science 
really works should read it. It’s all here: the 
ambition, jealousy and factionalism — as well 
as the heroic late nights, crippling anxiety 
and disastrous mistakes — that underlie 
the apparently serene and objective surface 
represented by the published record. m 


Georgina Ferry is the author of many 
books, including Dorothy Hodgkin: A Life. 
e-mail: georgina.ferry@gmail.com 
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Political solutions 
can beat algorithms 


Regarding the computing arms 
race in US voter redistricting 
(W. K. Tam Cho Nature 558, 
487; 2018) between voting- 
rights advocates and users 

of sophisticated software for 
gerrymandering, a political 
solution could be simpler 

and more effective than a 
technological one. 

Many criteria for electoral 
mapping compete with one 
another — such as population 
equality, compactness, 
maintenance of political and 
geographical boundaries and 
respect for communities of 
interest. Politicians can therefore 
argue for personally advantageous 
computer-optimized electoral 
maps while plausibly denying any 
nefarious intent to disenfranchise 
specific voters. However, turning 
the process over to an algorithm 
merely shifts the debate to the 
fairness of the algorithm itself. 
Computers might be impervious 
to the lure of power; their users 
are not. 

Technology cannot readily 
resolve social problems that are 
based on conflicts over values 
and interests. To improve the 
ailing US political system, 
the country should instead 
consider a move to proportional 
representation — used in some 
form by many democratic 
nations. This would be much less 
susceptible to gerrymandering 
than the current winner-takes-all 
US voting system. 

Daniel J. Rozell Stony Brook 
University, New York, USA. 
daniel.rozell@stonybrook.edu 


Shark’s DNA should 
calm the waters 


Alarm quickly spread after two 
children were bitten in July 
while swimming off Fire Island, 
New York, because great white 
sharks (Carcharodon carcharias) 
frequent the region. We can 
vouch from DNA analysis that 
another, relatively harmless, 
shark species was responsible for 


biting one of the individuals, and, 
in our view, probably the other 
as well. 

We extracted DNA froma 
decontaminated fragment of 
shark tooth recovered from one 
of the bite wounds. Comparison 
with mitochondrial DNA 
sequences of some 900 species 
of cartilaginous fish enabled us 
to identify the DNA source as 
a sand tiger shark (Carcharias 
taurus; unpublished results). 
This shark is generally not 
considered to be dangerous to 
humans (J. I. Castro The Sharks 
of North America Oxford Univ. 
Press, 2011), despite its size (up 
to 3 metres long and weighing 
more than 200 kilograms). 

The incidents occurred some 
7 kilometres apart and within 
minutes of each other. This is 
not as surprising as it might 
seem — sand tiger sharks prey 
on schooling fishes, tracking 
them as they move inshore. 

This makes it more likely that 

the sharks will mistake nearby 
swimmers for prey and bite them. 
However, such random events are 
extremely rare. 

Gavin J. P. Naylor* University of 
Florida, Gainesville, USA. 
gnaylor@flmnh.ufl.edu 

*On behalf of 4 correspondents (see 
go.nature.com/2nwptfh for full list). 


Screen evidence for 
power and bias 


In our view, the four principles 
for making evidence synthesis 
more useful for policy 

would be strengthened by 
taking power and bias into 
account (C. A. Donnelly et al. 
Nature 558, 361-364; 2018 ). 
Otherwise, the principles could 
fall short for issues that involve 
uncertain facts, disputed values, 
high stakes and urgent decisions 
— as in global biodiversity 

loss and climate change, for 
example. 

Sometimes, complexities in 
scientific evidence allow several 
contrasting but equally valid 
interpretations. In such cases, 
there is a risk that privileged 
stakeholders associated with 


one way of thinking might 
unduly influence the particular 
values and interests prioritized 
in that synthesis. 

Scientific aspirations, 
integrity and practices are 
crucial for challenging this 
authority. But if scientific 
disciplines and organizations 
deny or become complacent 
about their own forms of bias, 
then claims that purport to be 
definitive and objective could 
distort decision-making. 

Evidence synthesis therefore 
needs to highlight contrasting 
valid framings of the best 
available evidence. A plural 
and conditional picture that 
is rigorous in embracing both 
social and natural sciences 
is more robust than single, 
evidence-based prescriptions. 
Analyses are inevitably 
influenced by politics. By 
improving transparency, those 
who hold power and privilege 
in and around science become 
more accountable. 

We therefore suggest adding 
a fifth principle of open- 
mindedness, with mandates 
to examine the evidence from 
outside as well as inside science; 
to explain how contrasting 
values and interests yield 
divergent interpretations and 
prescriptions; and to evaluate 
the effects of power and 
privilege within established 
practices of evidence synthesis. 
Andy Stirling University of 
Sussex, Brighton, UK. 

Clive Mitchell Scottish Natural 
Heritage, Battleby, Perth, UK. 
clive.mitchell@nature.scot 


EU politicians must 
trust plant science 


The latest ruling by the European 
Court of Justice requires that 
crops created using gene-editing 
techniques such as CRISPR must 
go through the same lengthy 
approval process as conventional 
genetically modified (GM) 
plants (see Nature 560, 16; 

2018). This has surprised many 
scientists, who are concerned 
that it will complicate promising 


applications of gene editing. 

The court took existing 
legislation into account in 
arriving at its decision, but the 
situation has changed greatly 
since the first directives on GM 
organisms in 1990. Hundreds 
of millions of hectares have 
been planted worldwide with 
GM crops, providing extensive 
experience with such products. 
And techniques developed 
since could potentially solve 
important questions in biology 
and agriculture. 

The court concluded that the 
European legislation considers 
the use of recombinant-DNA 
techniques in gene editing as 
sufficient grounds for classifying 
genome-edited plants as 
genetically modified. This 
could result in a costly approval 
process and might generate 
problems with unregulated 
genome-edited products 
imported from countries such as 
the United States. 

One possibility would be to 
alter the legislation, but this 
could be difficult given current 
European politics. Another 
would be to revisit the European 
directives issued since 1990, 
which were based on a case- 
by-case scientific analysis of 
GM plants. 

As members of the European 
Food Safety Authority's panel 
on GM organisms since its 
inception, we have witnessed a 
mounting distrust of scientific 
assessments. That has manifested 
with the approval of rules that 
demand a rigid analysis of 
GM plants. We need to reverse 
this trend, for example by 
acknowledging that approval of 
genome-edited plants calls for 
much less data than classic GM 
organisms, and by commanding 
greater respect for the work of 
scientific panels. This would 
promote scientifically sound risk 
analysis while complying with 
existing directives. 

Josep M. Casacuberta, Pere 
Puigdomenech Centre for 
Research in Agricultural 
Genomics, Barcelona, Spain. 
pere.puigdomenech@ 
cragenomica.es 
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PLANETARY SCIENCE 


Jupiter’s magnetic field revealed 


The magnetic field of Jupiter has been found to be different from all other known planetary magnetic fields. This result 
could have major implications for our understanding of the interiors of giant planets. SEE LETTER P.76 


CHRIS JONES 


ASA’s Juno spacecraft is currently 
| \ | mapping Jupiter’s magnetic field in 
unprecedented detail. Because the 
field originates in the planet's interior, it can 
provide insights into what is going on beneath 
the spectacular swirling clouds in the planet’s 
surface layers. On page 76, Moore et al.’ ana- 
lyse data from Juno and find that Jupiter's 
magnetic field is substantially different in the 
planet’s northern and southern hemispheres. 
The authors consider what might be happen- 
ing in the planet's interior to account for this 
asymmetry. 

Juno reached Jupiter on 4 July 2016, and it 
has been gathering data that are transforming 
our understanding of the planet’s deep inte- 
rior. Previously, we had only a broad-brush 
overview of Jupiter’s magnetic field’. Juno has 
brought the picture into much sharper focus, 
allowing a revised model of the field to be 
constructed’. These advances were possible 
owing to the close approach that Juno makes 
to Jupiter — the spacecraft flies only about 
4,000 kilometres above Jupiter’s surface as it 
dives into the planet’s gravitational field once 
every 53 days’. 

Jupiter has the strongest planetary magnetic 
field in the Solar System. Ironically, this field 
is the biggest threat to the Juno mission. High- 
energy particles from the Sun are trapped in 


the field, producing a hazard that is dangerous 
to the electronics on which the mission 
depends. Fortunately, Juno was designed with 
protection against this and has survived so far. 

The magnetic field of Jupiter is maintained 
by electric currents that flow in the planet's 
interior. Jupiter is made up mainly of hydro- 
gen and helium, so it is quite surprising that 
it can conduct electricity at all. However, the 
extremely high pressure and density in the 
planet enable hydrogen to enter a state known 
as metallic hydrogen’. Metallic hydrogen has 
an electrical conductivity similar to that of 
metals, allowing electric currents to flow. 

Giant planets take billions of years to cool 
down after they are formed. Consequently, 
there is as much heat coming out of Jupiter's 
interior as is received by the planet from the 
Sun. This heat is carried by convection cur- 
rents, which stir the interior and produce the 
swirling clouds and storms — suchas the Great 
Red Spot — that are so beautifully captured by 
Juno’ cameras. The convection-driven flows of 
fluid in the interior are slower than the surface 
winds, but they are strong enough to gener- 
ate Jupiter's magnetic field by a process called 
dynamo action®”. 

Earth’s magnetic field is also produced by 
convection-driven flows in the planet's interior, 
but it is the planet’s liquid-iron core that allows 
electric currents to flow. The fields of both 
Jupiter and Earth are mainly dipolar — the 


radial component of the field is mostly positive 
in the northern hemisphere and mostly nega- 
tive in the southern hemisphere, as if the planet 
contained a bar magnet (Fig. 1a). Moore and 
colleagues report that the non-dipolar part of 
Jupiter's field is confined almost entirely to the 
northern hemisphere (Fig. 1b). This is in stark 
contrast to Earth’s field, for which the non- 
dipolar part is evenly distributed between the 
two hemispheres. 

Moore et al. suggest several possible 
explanations for the morphology of Jupiter’s 
magnetic field. One explanation concerns 
Jupiter’s core, the nature of which is still a 
mystery. Some models of the planet assume 
a compact core with a mass about five times 
that of Earth®. But amuch larger, dilute core is 
also feasible’, and could affect field generation. 

Another explanation is that there are one or 
more stable layers of fluid deep inside Jupiter. 
Saturn is thought to have a stable layer in its 
interior, which could account for why its mag- 
netic field is almost completely symmetrical 
about the planet’s rotation axis'” — vastly dif- 
ferent from the fields of Jupiter and Earth. In 
Jupiter, these stable layers might be regions in 
which the composition of the fluid changes, 
partitioning the planet’s interior into zones. 
If the transition regions contained a helium 
concentration gradient, they could be bottom 
heavy, altering the fluid flow inside the planet 
and therefore the magnetic field. 


Figure 1 | Maps of Jupiter’s magnetic field. a, In the northern hemisphere of 
Jupiter, the radial component of the planet’s magnetic field points mainly in the 
positive (outwards) direction (yellow-red shades). Conversely, in the southern 
hemisphere, the radial component points predominantly in the negative 
(inwards) direction (green-blue shades). Such a configuration is known as a 
dipole. The colour scale depicts the strength of the radial magnetic field in units 


36 | NATURE | VOL 561 | 6 SEPTEMBER 2018 


© 2018 Springer Nature Limited. All rights reserved. 


of millitesla. b, Moore et al.’ report that the non-dipolar part of Jupiter’s radial 
magnetic field is almost entirely concentrated in the northern hemisphere — 
unlike all other known planetary magnetic fields. The maps in a and b illustrate 
the magnetic field at a distance of 90% of Jupiter’s radius from the planet’s centre, 
under the assumption that substantial electric currents in the planet all reside at 
distances closer to the centre. (Adapted from Fig. le and Fig. 3a of ref. 1.) 


To investigate how planetary magnetic fields 
are generated, it is now possible to solve the 
fundamental equations that govern the fluid 
flows and the magnetic fields inside planets. 
The basic principles of dynamo action were 
laid down a century ago”, but solving the 
fluid-dynamo equations proved difficult. 
Computers have been able to handle the cal- 
culations required to model Earth’s dynamo 
only since 1995 (ref. 12). Nevertheless, much 
progress has been made, and computational 
models of dynamos can now capture many of 
the characteristics of Earth's magnetic field”. 

In the past five years, these models have 
been adapted to deal with the large variations 
in density between the interior and atmos- 
phere of Jupiter®’”, and can now be compared 
with the field inferred by Moore and col- 
leagues. However, dynamo models depend on 
the internal structure of the planet, which in 
turn depends on the planet’s thermodynamic 
properties, electrical-conductivity profile 
and composition. Although these issues have 
been extensively explored, some uncertainty 
remains. Models of fields that are dipolar but 
broadly symmetric about the equator have 
been developed’, as have models of fields that 
are asymmetric but not dipolar'*. The chal- 
lenge is therefore to formulate models of fields 
that are both asymmetric and dipolar. 

Moore and colleagues’ suggested explana- 
tions for Jupiter’s field morphology can now 
be tested by dynamo modellers to discover 
whether the explanations are indeed compat- 
ible with Juno's observations. Exciting times 
lie ahead for the study of the interiors of giant 
planets, as modellers digest the information 
coming from Juno and begin to work out a 
clearer picture of the inside of Jupiter. m 


Chris Jones is in the Department of 
Applied Mathematics, University of Leeds, 
Leeds LS2 9JT, UK. 

e-mail: cajones@maths.leeds.ac.uk 
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Animmune response 
with a sweet tooth 


A previously unknown pathway that enables mammalian cells to recognize 
infection and trigger an immune response requires a kinase enzyme in the host 
cell to bind a sugar molecule produced by infecting bacteria. SEE LETTER P.122 


JOHN-DEMIAN SAUER 


acterial infections are a major cause of 
B disease and death worldwide. The innate 

branch of the mammalian immune 
system, which recognizes and reacts to general 
characteristics of pathogenic organisms, has a 
key protective role. On page 122, Zhou et al.' 
describe a mechanism by which the innate 
immune system is activated in response to 
bacterial sugar molecules. This finding broad- 
ens our understanding of the types of molecule 
that can be recognized as hallmarks of bacter- 
ial infection and the host proteins that can 
recognize such molecules. 

A key advance in our understanding of 
how the innate immune system functions 
was the identification of proteins called 
pattern-recognition receptors (PRRs), which 
recognize ‘non-self’ molecules termed patho- 
gen-associated molecular patterns (PAMPs). 
Beginning with the Toll and Toll-like receptor 
PRRs*“ in the late 1990s, the identification 
of PRRs and the PAMPs that they recognize 


has proceeded at a breathtaking pace. 

A key function of PRRs is to help drive 
the expression of secreted proteins called 
cytokines, which alert the immune system to 
the presence of infection. The transcription 
factor NF-«B is a central regulator of cytokine 
expression. Zhou and colleagues studied 
human cells grown in vitro to try to identify 
pathways that activate NF-«B in response to 
infection by the bacterium Yersinia pseudo- 
tuberculosis. This bacterium has a needle-like, 
multiprotein structure called a type III secre- 
tion system (T3SS), which is required for the 
direct transfer of bacterial proteins into host 
cells. T3SSs are evolutionarily conserved in 
many pathogenic bacteria. 

Zhou et al. took an unbiased approach 
and screened a collection of Y. pseudo- 
tuberculosis genetic mutants to identify bacte- 
rial genes that are linked to NF-«B activation 
in response to infection. This led the authors 
to focus on the enzyme HIdE, which catalyses 
steps in the biosynthetic pathway that gen- 
erates lipopolysaccharide (LPS) molecules. 
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Figure 1 | Bacterial sugars trigger a host immune response. a, Zhou et al.' demonstrate that in bacteria 
such as Yersinia pseudotuberculosis, which has a multiprotein complex called a type III secretion system 
(T3SS), and in other bacterial species lacking a T3SS, the sugar molecule ADP-B-p-manno-heptose 
(ADP-Hep) can enter a host cell, by an unknown route (possibly through a transporter protein), and 

can trigger a signalling pathway that drives inflammation. When ADP-Hep enters the host cell, it binds 

to ALPK1, which activates the protein TIFA by adding a phosphate group (P) to it. The downstream 
signalling pathway, not all the steps of which are shown, leads to activation of the protein NF-«B, which 
drives the expression of cytokine proteins that stimulate an immune response to the infection. b, The 
authors also report that if the bacterially produced sugar p-glycero-B-p-manno-heptose 1,7-bisphosphate 
(HBP) enters the host cell (by a route that remains to be determined), it can be converted by host enzymes 
of the NMNAT family into the molecule ADP-heptose 7-P. This binds to ALPK1 and activates the same 


pathway as that activated by ADP-Hep. 
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LPS is an essential component of the cell 
surface of a subset of bacterial pathogens called 
Gram-negative bacteria. 

Using genetically mutated bacteria and puri- 
fied sugar molecules, the authors sought to 
pinpoint the molecules in the LPS biosynthetic 
pathway that stimulate NF-«B activation. They 
found that the presence of bacterial sugars, 
including ADP-B-p-manno-heptose (ADP- 
Hep) and p-glycero-B-p-manno-heptose 
1,7-bisphosphate (HBP), in the host-cell 
cytoplasm triggered NF-«B activation. 
This is consistent with a study” of Neisseria 
meningitidis bacteria that demonstrated that 
HBP can trigger NF-«B responses in host cells. 
Crucially, Zhou et al. showed that ADP-Hep 
is 100 times more potent than is HBP at acti- 
vating NF-«B. They found that addition of 
ADP-Hep to the extracellular environment 
of host cells can activate NF-«B, suggesting 
that dedicated host-cell transporter proteins 
deliver ADP-Hep to the host's cytoplasm. 

No PRR was known to recognize ADP-Hep. 
To search for one, the authors used a gene- 
editing approach to conduct a screen in which 
they generated random mutations in host cells 
and tested whether the mutations affected 
ADP-Hep recognition. They uncovered two 
candidate genes that respectively encode the 
kinase enzyme ALPK1I and the protein TIFA, 
and showed that these are required for NF-«B 
activation in response to ADP-Hep in host cells 
(Fig. 1). A previous study had revealed’ that 
TIFA is required for recognition of HBP from 
N. meningitidis. ALPK1 and TIFA signalling 
has also been linked to HBP-dependent host 
activation of NF-«B in response to infection by 
the bacteria Shigella flexneri° and Helicobacter 
pylori’. Using biochemical approaches, Zhou 
and colleagues demonstrated that ADP-Hep 
binds directly to the amino terminus of ALPK1. 
The authors solved the X-ray crystal structure 
of ALPK1 in a complex with ADP-Hep, and 
validated their structural model by testing 
the effect of mutations in ALPK1 that were 
predicted to impair its binding to ADP-Hep. 

Zhou et al. also generated ALPK1-deficient 
mice. The NF-«xB-dependent production of 
cytokines was significantly reduced in these 
animals after challenge with either ADP-Hep 
or the pathogenic bacterium Burkholderia ceno- 
cepacia, compared with results seen in animals 
that were not deficient in ALPK1. Moreover, the 
number of bacteria in the lungs of mice infected 
with B. cenocepacia was higher in ALPK1- 
deficient animals than in wild-type mice. 

Perhaps Zhou and colleagues’ most striking 
finding is that mammalian adenylyltransferase 
enzymes, specifically those of the NUNAT 
family, catalyse a reaction that converts HBP 
into a molecule called ADP-heptose 7-P, which 
can act as a ligand by binding to ALPK1. Previ- 
ous work’ had suggested that HBP is a PAMP 
that can directly activate NF-xB. Although 
HBP can be defined as a PAMP, given that it 
is a bacterially derived molecule that triggers 
a host response, Zhou and colleagues’ data 


indicate that HBP must be converted to 
ADP-heptose 7-P by host enzymes to trigger 
this response. The authors report slight dif- 
ferences in the way in which ADP-Hep and 
ADP-heptose 7-P bind to ALPK1, and use these 
differences to demonstrate why ADP-Hep and 
not HBP or ADP-heptose 7-P is the relevant 
ligand for ALPK1-mediated NF-«B activation, 
at least in Y. pseudotuberculosis infection. 

Zhou and colleagues’ findings have impor- 
tant implications. Evidence that ADP-Hep 
isa PAMP adds to a growing awareness that 
bacterial metabolites can act as PAMPs. Given 
that ADP-Hep is needed to synthesize an 
essential component of the outer membrane 
of most Gram-negative bacteria, this makes it 
an ideal PAMP. However, it is not known how 
this molecule, which is normally found inside 
the bacterium, reaches the cytoplasm of the 
host cell. In Y. pseudotuberculosis, this process 
requires the T3SS, although it is unclear 
whether ADP-Hep is actively transported 
or accidentally leaks through the T3SS, or 
whether it enters by the pores that the T3SS 
generates in the host-cell membrane. 

The authors report that bacterial species that 
lack a T3SS can still trigger the ALPK1 path- 
way inan ADP-Hep-dependent manner, con- 
sistent with the ability of purified ADP-Hep to 
activate the pathway by an extracellular route. 
This suggests that a dedicated transport system 
might exist that allows the host cell to sample 
its extracellular surroundings for the presence 
of this PAMP, similar to the way in which cer- 
tain extracellular PAMPs are transported to the 
cytoplasm for recognition by host proteins’. 

Why does bacterial ADP-Hep exposure 
occur ifit activates the innate immune system? 
Perhaps its release is needed to fulfil some as 
yet unknown function. Pathogens often evolve 
mechanisms to evade or thwart an immune- 
system response. If pathogens have evolved 
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strategies to avoid triggering an ADP-Hep- 
mediated immune response, understanding 
such strategies might suggest new therapeutic 
approaches to fight bacterial infections. 

The authors’ observation that host enzymes 
can convert bacterial metabolites that have 
poor immune-activating characteristics into 
potent PA MPs offers a new perspective on the 
evolutionary battle between pathogens and 
their hosts. Although Zhou et al. show that 
ADP-Hep is the relevant immune-triggering 
ligand for Y. pseudotuberculosis infections, it 
remains to be seen whether HBP is converted 
into ADP-heptose 7-P during other bacterial 
infections. This issue is particularly relevant for 
pathogens (for example, Shigella) that invade 
the host-cell cytoplasm and that might shed 
PAMPs such as HBP directly into the cyto- 
plasm. Zhou and colleagues’ work also offers a 
fresh perspective on the types of molecule that 
can act as PAMPs or their PRRs, and where 
and how researchers should be searching for 
such molecules. m 
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DNA tags light up 
sugars on proteins 


Methods for imaging sugars attached to proteins — the protein glycoforms — are 
of interest because glycoforms affect protein movement and localization in cells. 
A versatile approach is now reported that uses DNA as molecular identity tags. 


TADASHI SUZUKI 


he attachment of sugar molecules 
Te proteins is one of the most com- 

mon protein modifications, found in 
all domains of life. Sugars attached to pro- 
teins are called glycans, and modulate the 
physicochemical and physiological prop- 
erties of the carrier proteins’. But tracking 
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and visualizing glycoforms — the specific 
patterns of sugars attached to a protein — in 
cells is challenging, particularly if you want to 
visualize several different glycoforms at once. 
Writing in Angewandte Chemie, Li et al.’ now 
report a method for doing this that relies on the 
dynamic interactions of a set of DNA codes. 
Since the early 1990s, the use of fluorescent 
tags as labels for proteins has revolutionized 
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Figure 1 | A method for visualizing sugars on proteins’. a, A DNA sequence (the protein code) 

is bound to the target protein through a sequence called an aptamer. The protein code is hybridized 
(forms a double helix) with a second sequence, called the timing code. A sugar attached to the protein is 
covalently attached to a ‘hairpin’. DNA, which contains a masked sequence that identifies the sugar (the 
monosaccharide code). b, A ‘decoding’ DNA molecule is added that hybridizes with the timing code 
(not shown), releasing the protein code so that it hybridizes with part of the hairpin. The hairpin opens, 
unmasking the monosaccharide code. c, A second hairpin DNA is added, which is complementary to 
the monosaccharide code; it also bears a fluorescent molecule (a fluorophore) at one end and a quencher 
molecule at the other, which deactivates fluorescence. The hairpin hybridizes with the DNA containing 
the monosaccharide code, opening the hairpin and allowing the fluorophore to fluoresce. The protein 
code is simultaneously exposed by this process, and can take part in another cycle of reactions. 


how cell biologists analyse protein movement 
and localization in cells”*. But even though the 
types of glycan attached to proteins can affect 
their movement and localization, it has been 
difficult to visualize any particular glycoform. 
One way in which researchers have attempted 
to solve this problem is by using a technique 
called fluorescence resonance energy transfer 
(FRET). In this technique, a fluorescent mol- 
ecule (a fluorophore) is attached to a protein of 
interest and a second fluorophore is attached 
to a specific sugar; fluorescence occurs only if 
the two molecules come into close proximity 
through the attachment of the sugar to the pro- 
tein® *. However, the need to use two different 
fluorophores can limit applications, for exam- 
ple by making it difficult to detect multiple 
glycoforms ofa protein in the same experiment. 

Li et al. overcome this problem using an 
approach that they describe as a hierarchi- 
cal coding strategy, in which multiple 
single-stranded DNA molecules are used as 
identification codes to visualize specific sugars 
attached to a chosen protein (Fig. 1). The first 
DNA molecule used in the authors’ system 
contains a sequence (known as an aptamer) 
that specifically binds to the target protein. 
The aptamer is attached to another sequence 
(the protein code) that identifies the protein. A 
second DNA molecule, called the timing code, 
contains a sequence that is complementary to 
the protein code, and that therefore hybridizes 
(forms a double helix) with it. 

The third DNA molecule used in Li and 
colleagues’ system contains three segments. 
The first segment is complementary to the pro- 
tein code. This is attached to a second sequence 
called the monosaccharide code, which iden- 
tifies a specific sugar. The third segment has 
a sequence that enables the complete strand 
to form a structure known asa hairpin, which 
masks the monosaccharide code. The hair- 
pin DNA is covalently attached to the sugar 


identified by the monosaccharide code. If the 
hairpin-bearing sugar is in turn attached to the 
target protein, this can bring the hairpin into 
close proximity with the double helix formed 
by the protein and timing codes. 

The final key component of Li and 
colleagues’ system is another hairpin DNA, 
which contains a complementary sequence 
to the monosaccharide code and a sequence 
that can displace the protein code from a dou- 
ble helix. The hairpin also has a fluorophore 
attached at the 5’ end, and a ‘quencher mol- 
ecule at the 3’ end. The quencher stops the 
fluorophore from fluorescing when the hairpin 
is closed, but allows fluorescence when the 
hairpin opens. 

So how do all these components interact to 
decode the crucial DNA identifiers and allow 
glycoforms to be visualized? The process is 
triggered when a single-stranded DNA that is 
complementary to the timing code is added 
to the system. This DNA hybridizes with the 
timing code, thus displacing and exposing 
the protein code. The exposed protein code 
then hybridizes with the complementary 
sequence in the hairpin attached to the sugar, 
opening up the hairpin and unmasking the 
monosaccharide code. 

When the fluorophore-carrying hairpin 
is added to the system, the unmasked 
monosaccharide code hybridizes with the 
complementary DNA sequence in that hair- 
pin. The hairpin therefore opens up, allowing 
its fluorophore to fluoresce: in effect, a fluores- 
cent tag has been attached to the sugar, allow- 
ing it to be detected. The hybridization also 
unmasks the protein code, making it available 
for another reaction cycle. The key element of 
Li and colleagues’ system is that the protein 
code is physically associated with the target 
protein, because this ensures that only hairpin- 
bearing sugars that are attached or close to the 
protein can become fluorescent. 
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The authors confirmed that the chain of 
reactions occurs in a cell-free system in vitro 
and used it to identify two glycoforms of the 
MUCI protein: MUC1 decorated with the 
sugar fucose, or with another sugar called 
sialic acid. Crucially, the authors also showed 
that the fluorescent signals can be generated 
and detected on cells that had been modified 
using a method known as metabolic labelling” 
to incorporate hairpin-bearing sugars. 

An advantage of this method is that, because 
the choice of DNA sequences that can be used 
as labels is effectively infinite, many different 
glycoforms can be imaged, as long as the pro- 
teins and sugars can be specifically labelled 
with their own DNA codes. Moreover, the 
authors clearly showed that sialylated and 
fucosylated MUCI1 could be simultaneously 
detected using their method. One potential 
limitation, however, is that the DNA used 
was not observed to be transported into cells 
through natural processes, suggesting that 
intracellular glycoforms cannot be detected 
by this method. This could actually be an 
advantage for studies that focus on cell-surface 
proteins. 

A few issues will need to be clarified in 
future studies. For example, the efficiency of 
the decoding process is unclear. It is also not 
known whether sugars on molecules next to 
the target proteins might sometimes become 
fluorescent, as a result of DNA hybridiza- 
tion between the protein code and hairpins 
attached to sugars on neighbouring pro- 
teins. Because a large number of glycans are 
attached to MUCI, the method might not need 
to be highly efficient to generate a detectable 
fluorescent signal for this protein, and any 
minute signals produced from neighbouring 
molecules would not be a serious problem. 
However, further experiments using other 
glycoproteins that have fewer sugars attached 
are needed to validate the method fully. 

Given that both the above issues might 
depend largely on the length of the DNA chains 
used, careful design of the DNA codes and of the 
aptamers will be essential for ensuring the spe- 
cific detection of other glycoforms. The prac- 
tical advantages and disadvantages of the new 
technique compared with other strategies for 
glycoform imaging that have been reported in 
the past few years — including two methods 
reported by workers from the same group as 
Lietal.'*" — also remain to be explored. 

Nevertheless, Li and colleagues’ hierarchical 
coding strategy for glycoform imaging shows 
great potential, and could be an important step 
in the development of a system analogous to 
the use of green fluorescent proteins for pro- 
tein tagging — which is now standard practice 
for biologists. The ultimate goal is to visualize 
glycoforms in a way that will enable us to see 
what we want to see, rather than only what can 
be seen. m 
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A systemic problem 
with pesticides 


Exposure to a sulfoximine-based pesticide has substantial adverse effects on 
bumblebee colonies. This finding suggests that concerns over the risks of exposing 
bees to insecticides should not be limited to neonicotinoids. SEE LETTER P.109 


NIGEL E. RAINE 


gricultural intensification has increased 

our reliance on pesticides, including 

insecticides. Although insecticides 
are useful for controlling crop damage caused 
by insect pests, they can also affect beneficial 
insects, potentially impairing their ability to 
control pests and pollinate crops’ — qualities 
on which farmers rely. Indeed, increases in 
insecticide use are one of several major fac- 
tors implicated in the worldwide declines of 
insect pollinators’. A commonly used class 
of insecticide called neonicotinoids has hit 
the headlines because of its impacts on bees. 
Siviter et al.’ report on page 109 that a potential 
neonicotinoid replacement, the sulfoximine- 
based insecticide sulfoxaflor, also harms these 
crucial pollinators. 

Insect pollinators that forage on neonicotin- 
oid-treated plants can be exposed to small 
amounts of insecticide each time they or their 
larvae feed on pollen and nectar*”. Although 
such chronic neonicotinoid exposure typi- 
cally does not kill bees, it can have sublethal 
effects — impairing a range of behaviours such 
as learning and foraging**, affecting nesting 
success, colony development and reproduc- 
tion’, and reducing pollination levels”. 
Because of this, substantial restrictions on 
neonicotinoid use have been introduced in 
some regions of the world, particularly Europe. 
Such restrictions might seem to be good news 
for bee health — but only if the insecticides 
that replace neonicotinoids are less harmful to 
insect pollinators. 

Similar to neonicotinoids, sulfoximine-based 
insecticides are absorbed and systemically dis- 
tributed throughout the plant. Sulfoxamines 
are one candidate to replace neonicotinoids", 
and have already been widely approved for 
use. Siviter and colleagues set out to assess the 
sublethal effects of sulfoxaflor on the agricul- 
turally important pollinator Bombus terrestris. 


This bumblebee is common in the wild, and is 
also reared commercially for crop pollination. 
Although it is convenient to use commercially 
reared colonies for experiments, the authors 
chose to use wild colonies — a decision that 
should be lauded because it enhances the 
ecological realism of their study. 

Siviter et al. collected 332 wild queen 
bumblebees, assessed them for parasites and 
used 249 uninfected individuals to start colo- 
nies in the laboratory. The authors succeeded 
in rearing colonies from 52 queens, providing 
a robust sample size for their experiment. They 
then randomly allocated pairs of size-matched 
bee colonies to either control or insecticide- 
exposure groups. The colonies fed at will 
for two weeks on either sugar water alone or 
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sugar water containing five parts per billion 
of sulfoxaflor (a concentration found in the 
nectar of crops sprayed with sulfoxaflor), 
before being moved outdoors, so that the 
researchers could monitor bee behaviour and 
colony development under field conditions. 

The team found that sulfoxaflor exposure 
had substantial and consistent effects on the 
rate of colony growth, which became appar- 
ent after just two to three weeks in the field. 
Sulfoxaflor-exposed colonies produced fewer 
female workers than did control colonies. 
They also produced 54% fewer reproductive 
offspring. This substantial difference was pre- 
dominantly driven by a decrease in the total 
number of males produced, but also reflects 
the fact that all of the 36 new queens produced 
came from just 3 of the control colonies. Such 
strong variation in queen production among 
control colonies is not unexpected, but the 
lack of queen production by any of the insecti- 
cide-exposed colonies is concerning, because 
queens are needed to start new colonies in the 
following year. 

These impairments in colony growth and 
reproduction are similar to those observed 
in comparable neonicotinoid-exposure 
studies*'® ''>'®. This similarity might be 
expected, given that both insecticide classes 
affect insects by binding to the same neuro- 
transmitter receptors'*. But whereas the 
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Figure 1 | Routes of bumblebee exposure to insecticides. Siviter et al.’ have investigated how exposure 
to the insecticide sulfoxaflor affects bumblebee colonies, using a combined laboratory-field protocol. 
There are multiple potential routes of exposure to systemic insecticides. a, In spring, insecticide-treated 
seeds are sown. Contaminated dust from seed planters drifts across fields, and lands on wild flowers 
(insecticide residues are indicated by red diamonds, routes of spread by red arrows). Residual insecticide 
in the soil from the previous year might affect queen bumblebees hibernating in the soil, or be taken up 
by wild flowers, leading to exposure of foraging queens that consume contaminated nectar and pollen. 
b, In summer, crops grown from treated seeds bloom, producing contaminated nectar and pollen (red 
stripes). Spray treatments can increase insecticide levels on crops and on nearby wild flowers. Foraging 
worker bees ingest insecticide-laced nectar and pollen from both treated crops and contaminated wild 


17,18 
flowers 


, and are exposed through contact with sprayed plant tissue when foraging on crops. Workers 


take insecticide-laced pollen and nectar back to the colony, where it is ingested by larvae (not shown). 
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effects on bumblebee colonies exposed to 
neonicotinoids seem to be driven by impaired 
pollen foraging’* (leading to limited nutrition 
for larvae), the authors found no evidence 
that sulfoxaflor exposure caused significant 
differences in foraging performance. Perhaps 
early-stage colony growth and subsequent 
reproductive output were affected by sulfoxaflor 
toxicity to developing larvae, or by some other 
indirect mechanism — either way, the timing 
of declines in colony growth rate suggests that 
chronic sublethal stress at an early stage resulted 
in substantially reduced colony reproduction”. 
Correctly determining the effects of 
insecticides relies on accurate assessments of 
exposure, which varies depending on whether 
chemicals are applied by spray, soil drench or 
seed treatment (Fig. 1). For example, spray 
applications can lead to relatively high levels 
of exposure for a few days, whereas seed treat- 
ments can result in low-level, chronic exposure 
through residues in nectar and pollen*”. The 
authors based exposure to sulfoxaflor in their 
experiment on a scenario in which bees ingest 
nectar from crop flowers following a spray 
application — currently, the most common 
mode of application for this insecticide class. 
However, this scenario discounts any 
exposure from contact with plant tissues 
or dietary exposure from crop pollen, and 
assumes that bees forage only on sulfoxaflor- 
treated crops — all factors that could affect 
exposure levels. Moreover, exposure pro- 
files would probably differ if sulfoxaflor were 
applied as a soil drench or seed treatment (an 
increasingly likely outcome following recent 
and probable future neonicotinoid regulation). 
Exposure could also be affected if sulfoxaflor, 
applied as a seed treatment or soil drench, 
moves outside crop fields and is absorbed by 
wild plants and contaminates their nectar and 
pollen, as reported for neonicotinoid seed treat- 
ments'””*. More data on sulfoxaflor concentra- 
tions in the nectar and pollen of bee-attractive 
crops are needed for an accurate assessment of 
the implications of sulfoxaflor use. 
Nonetheless, Siviter et al. provide a valuable 
first step towards understanding the effects 
of sulfoxaflor exposure on bees. Future 
discussions must be broader than two-way 
comparisons of neonicotinoids and sul- 
foximines, because other classes of systemic 
insecticide (such as butenolides and anthranilic 
diamides) are also in agricultural use. It is vital 
to ascertain which of these insecticide classes 
represents the lowest potential risk to pollina- 
tors. A major part of the answer depends on 
how comparative risk assessments are under- 
taken, including which of the 20,000 living 
bee species are considered, because there is 
substantial variation in physiology, behaviour 
and ecology between these species. Such dif- 
ferences — particularly the extent to which 
species are social — might affect the bees’ sen- 
sitivity to insecticides’”'*"®. For instance, low- 
level insecticide exposure might have more 
impact on solitary bees than on highly social 


colonies that have an abundance of workers. 

Finally, commercially reared pollinators 
(particularly honeybees) feature prominently 
in global agriculture, but cannot provide all of 
the crop-pollination services needed”. Wild 
pollinators, including bumblebees and solitary 
bees, have a crucial, undervalued role that is 
likely to become increasingly important as 
our crop-pollination demands rise’”’. Our 
understanding of the risks to pollinators, and 
the choices we make about pest control, must 
evolve to reflect and balance these realities. 
There are no risk-free choices, but with more 
information such as that provided by Siviter 
and colleagues, we can make the most appro- 
priate decisions about how to produce the food 
we need without inflicting irreparable damage 
on the global environment and the essential 
ecosystem services (such as pollination) on 
which we depend. = 
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Spotlight on proteins 
that aid malaria 


The multiprotein complex PTEX enables malaria- causing parasites to survive 
inside red blood cells. Studies reveal how PTEX assembles, and identify a function 
for one of the complex’s proteins, EXP2. SEE ARTICLE P.70 


TANIA F. DE KONING-WARD 


alaria is caused by the parasite 
Messrs falciparum. For part of 

its life cycle, this organism resides 
inside human red blood cells in a membrane- 
bound compartment called a vacuole. To 
survive, multiply and evade an immune 
response in this environment, P. falciparum 
must transport nutrients and proteins across 
the vacuolar membrane’. On page 70, Ho et al’ 
report the structure of the parasite PTEX com- 
plex, which resides on the vacuolar membrane 
and facilitates the export of proteins from the 
vacuole to the cytoplasm of red blood cells’. 
And ina paper in Nature Microbiology, Gar- 
ten et al.* reveal that the protein EXP2, which 
forms part of the PTEX protein-conducting 
channel located in the vacuolar membrane, 
can also form a channel that facilitates nutrient 
transfer across the membrane. These insights 
into the structure and function of key proteins 


that aid the survival of P falciparum might help 
efforts to develop new antimalarial drugs. 

PTEX consists of five proteins’: HSP101, 
PTEX150, EXP2, PTEX88 and TRX2. Multiple 
HSP101, PTEX150 and EXP2 molecules 
assemble to form the core part of PTEX*”. It has 
been predicted that HSP101 unfolds proteins 
destined for export, and provides the energy 
needed for cargo to pass through the vacuolar- 
membrane-spanning part of the channel, which 
is proposed” to consist of EXP2. PTEX150 is 
thought’ to have a structural role, connecting 
HSP101 and EXP2. 

Reduced expression’ of HSP101 or 
PTEX150, or inhibition® of the assembly of 
HSP101 into the PTEX complex, results in 
parasite death. PTEX is specific to species 
of the genus Plasmodium and is not made 
by humans. It is an attractive drug target 
because it provides the only known route by 
which parasite proteins enter the cytoplasm 
ofa red blood cell. However, PTEX’s relative 
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50 Years Ago 


Everybody knows, of course, that 
there is a genuine and unavoidable 
conflict between the functions of 
museums as centres of scholarship 
and as places of entertainment. 
Curators can be forgiven for 
wishing that visitors would let 
them get on with serious work. 

In practice, the Science Museum 
seems to have mastered these 
yearnings quite successfully, and 
in the past few months there have 
been some welcome signs of an 
anxiety to please ... Yet there is a 
long way to go before the museum 
shoulders wholeheartedly its 
responsibility for seeing that 
people, and particularly young 
people, are provided with a 

vivid and contemporary vision 

of what science is like. Even the 
new children’s exhibition will 

not let the little creatures know 
about electronic computers, 

for example. 

From Nature 7 September 1968 


100 Years Ago 


Considerable interest was taken 
last week in the demonstrations 

of “reading by ear” at the British 
Scientific Products Exhibition. 
The original construction of 

Dr. Fournier d’Albe'’s “type-reading 
optophone’” ... has recently 

been modified by replacing the 
Nernst lamp by a small drawn-wire 
lamp, and by arranging the whole 
apparatus in such a manner that 
any ordinary book or newspaper 
can be inserted and read without 
cutting it up into pages or 
columns. The demonstrations 
consisted in taking an ordinary 
book... , opening it at random ... 
and asking the blind pupil to reada 
few words or lines ... By a curious 
coincidence the first words thus 
read were “in the light”. The reader, 
a girl of nineteen blind from early 
infancy, was the first blind person 
to read by ear. 

From Nature 5 September 1918 
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Figure 1 | Proteins that are essential for the malaria parasite’s survival. During part of its life cycle, 

the Plasmodium falciparum parasite resides inside a membrane-bound vacuole in human red blood cells. 
Ho et al.’ report the structure of a parasitic multiprotein complex called PTEX, which is present on the inside 
of the vacuolar membrane. This complex is essential for parasite survival, and proteins are exported from 
the vacuole through PTEX into the cytoplasm of red blood cells. The authors’ analysis provides a detailed 
view of three of the proteins that form PTEX: EXP2, HSP101 and PTEX150 (only 20% of the structure of 
PTEX150 was determined). The authors’ work also illuminates how proteins transit through the PTEX 
complex. Garten et al.’ report that EXP2 can form a channel that enables nutrients to be transported across 
the vacuolar membrane. Whether this occurs in both directions, or in only one, is not known. 


novelty offers few clues to how it functions. 
EXP2 synthesized in the laboratory can 
form protein channels in lipid bilayers’. 
However, there have been no reports of full- 
length HSP101 or PTEX150 having been 
successfully synthesized for use in in vitro 
experiments. This has prevented structural 
analysis of the proteins, or reconstitution of 
the core PTEX complex in lipid membranes, 
to determine how the complex assembles 
and functions. 

Because of these experimental limitations, 
Ho et al.’ opted instead to extract PTEX 
directly from red blood cells containing 
the parasite. Then, using a technique called 
cryo-electron microscopy (cryo-EM), the 
authors captured two distinct structural 
conformations of the core PTEX complex 
in the process of exporting unfolded protein 
cargo — they called these conformations the 
‘engaged’ and ‘resetting’ states. The cryo-EM 
analysis revealed that HSP101, PTEX150 and 
EXP2 assemble into an asymmetrical struc- 
ture containing six molecules of HSP101, 
seven of PTEX150 and seven of EXP2. These 
structures closely align with models of the 
organization and size of PTEX that had been 
predicted from biochemical and protein- 
analysis experiments”. 

Ho and colleagues found that the seven EXP2 
molecules, which make up the protein channel 
in the lipid membrane, create a funnel shape, 
with the amino terminus of each molecule 
forming a transmembrane helix in the vacuolar 
membrane to provide an anchoring ‘stem’ 
(Fig. 1). The ‘mouth’ of EXP2 constitutes the 
bulk of the protein, and faces into the vacuole. 
This end of EXP2 contains a domain that 
tethers it to the carboxy-terminal domain of 
HSP101, situated directly on top. Only approxi- 
mately 20% of the structure of PTEX150 could 
be determined. Nevertheless, this was sufficient 
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to reveal that each PTEX150 molecule slots in 
between adjacent EXP2 molecules at the mouth 
of the EXP2 funnel, curling down towards the 
stem. Thus, PTEX150 provides a protective 
path for unfolded protein cargo transiting from 
HSP101 to EXP2. 

Of the three proteins, HSP101 displayed 
the greatest structural difference between the 
engaged and resetting states of PTEX, and on 
this basis the authors propose a mechanism for 
how cargo is threaded through PTEX’s central 
cavity. In this model, domains of the six assem- 
bled HSP101 molecules form two ‘hands that 
work together to thread unfolded cargo through 
the PTEX150 and EXP2 funnel. In the engaged 
state of PTEX, both the ‘active’ and ‘passive 
hands of HSP101 grasp the unfolded cargo. 
The cargo is then fed downwards through the 
central cavity of PTEX ina spiral fashion as it 
passes from the active to the passive hand. In 
the resetting state, HSP101’s active hand moves 
upwards to grasp the next section of the cargo 
protein for transport, and the passive hand grips 
the cargo to prevent it from slipping backwards 
and away from the PTEX channel. 

The cryo-EM structures provide insight 
into several crucial interactions between the 
PTEX components. These interactions are 
potentially required for assembly and opti- 
mal function of the complex, and could be 
tested using genetic approaches to validate 
the model. Ho and colleagues were unable 
to determine the structure of the N-terminal 
domain of HSP101 that binds the protein 
cargo. Thus, it is unclear how cargo is recog- 
nized by HSP101, and whether cargo proteins 
are unfolded by proteins known as chaperones 
before they reach PTEX. Given that unfolded 
proteins pass through PTEX, these cargo 
proteins would then need to be refolded to 
function, presumably by other chaperone 
proteins. However, because EXP2 does not 


extend into the cytoplasm of red blood cells, it 
is unclear how chaperone proteins in the host 
cell might be recruited to cargo exiting PTEX. 
Garten et al.’ investigated EXP2 using in vitro 
experiments, and report that it has another role 
in addition to its function in PTEX. Previous 
experiments using electrophysiological tech- 
niques have shown that a channel exists in 
the vacuolar membrane of parasite-infected 
red blood cells through which nutrients such 
as amino acids and sugars can pass”®, but the 
identity of this channel has been a mystery. In 
electrophysiological studies, Garten and col- 
leagues demonstrated a direct relationship 
between the level of expression of EXP2 and 
the frequency of detection of the mysterious 
channel. When the authors generated a ver- 
sion of EXP2 that had a truncated C-terminal 
domain, which is located in the vacuole and 
is not required for protein export, this altered 
the voltage-response properties of the nutrient 
channel, leading the authors to conclude that 
EXP2 is indeed the elusive nutrient channel. 
That EXP2 might have a role separate 
from its function in PTEX is consistent 
with evidence that EXP2’s gene-expression 
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profile differs from that of the other PTEX 
components’. Moreover, the authors found 
that most EXP2 is not present in a complex 
with PTEX. Although EXP2 is essential for 
parasite survival'!””, the contribution of the 
EXP2 nutrient channel to parasite growth 
remains unknown. The channel could be char- 
acterized in detail if EXP2 was incorporated 
into lipid bilayers for in vitro experiments. 
The studies by Ho, Garten and their 
respective colleagues offer a close look at 
how major P. falciparum proteins func- 
tion. Interestingly, EXP2 is evolutionarily 
conserved among vacuolar-dwelling para- 
sites called apicomplexans’. Perhaps the 
nutrient-transiting capacity of EXP2 was 
adapted by P. falciparum to generate a 
protein-conducting channel that evolved 
through the recruitment of other proteins 
such as HSP101 and PTEX150. EXP2 and 
PTEX are expressed throughout the life 
cycle of P. falciparum, so drugs that target 
them might be highly effective at tackling 
malaria. These new insights into the interac- 
tions between the components of PTEX offer 
exciting possibilities for the development of 


Designer atom arrays 
for quantum computing 


A key step in the development of quantum computers that use neutral atoms as 
quantum bits is the assembly of tailored 3D arrays of atoms. Two laser-based 
approaches have now been reported to do this. SEE LETTERS P.79 & P.83 


NATHAN LUNDBLAD 


uantum computers and simulators are 
of enormous interest because of their 
potential to shed light on mysteries of 
physics that are difficult to model using con- 
ventional computers. Some physical platforms 


a= Atom Optical lattice b 


used in realizing quantum-computing pro- 
tocols — including trapped ions and several 
solid-state systems based on superconduc- 
tors — have received increased attention in the 
past year. But in this issue, two groups report 
technical breakthroughs that will aid the devel- 
opment of another platform: trapped neutral 
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peptides or small molecules that might block 
the function of this complex. m 
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atoms. Barredo et al.' (page 79) report their use 
of precision optical-engineering methods to 
sort atoms into arbitrary 3D patterns, whereas 
Kumar et al.” (page 83) construct cubic lattices 
by revisiting a fanciful thought experiment 
known as Maxwell’s demon. The ability to 
organize neutral atoms exactly into planned 
3D arrays will be valuable for the development 
of neutral-atom quantum computers that use a 
large number of quantum bits (qubits). 
Arrays of isolated neutral atoms have long 
shown promise for quantum computing 
because neutral-atom qubits are extremely 
well isolated from environmental noise and 
are highly controllable, and also because such 
systems can be scaled up to large numbers of 
qubits**. Given that controlled interactions 
between atoms are needed to perform quan- 
tum-computing operations, neutral-atom 
quantum computers will need qubits to be pre- 
cisely arranged in a specified pattern. However, 


Figure 1 | A protocol for arranging neutral atoms in cubic optical lattices. 
Kumar et al.’ report a method for arranging ultracold, neutral caesium atoms 
in defined patterns in a cubic, 3D optical lattice — a series of laser-generated 
potential-energy wells in which atoms can be confined. Only one layer of atoms 
is shown, for simplicity. a, The atoms start off in random positions and in the 
same electronic state (state A, red). The shaded square indicates a target region 


that is to be filled with atoms. b, A combination of lasers and microwaves 

(wavy arrow) flips the state of one atom into a different state (state B, turquoise). 
c, A lattice shift is induced that moves the lattice and all atoms in state A halfa 
step to the right and those in state B halfa step to the left. d, The atom in state B 
is flipped back to state A. e, A reverse lattice shift moves the lattice and all atoms 
in state A halfa step to the left, so that the square region is now filled with atoms. 
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Figure 2 | A protocol for arranging neutral atoms in arbitrary 

3D patterns. Optical tweezers are laser-generated optical traps that can 
capture atoms. Barredo et al.' formed 3D arrays of optical tweezers in 
arbitrary patterns (each vertex of the array represents an optical tweezer), 


developing methods for sorting atoms into 
patterns has proved challenging. Neutral- 
atom qubits require ultracold temperatures 
and extremely high vacuums to function, 
and therefore require complicated apparatus; 
ordering them into arrays using optical tech- 
niques adds an extra level of practical com- 
plexity. Progress has been made in arranging 
neutral atoms in one and two dimensions’ *, 
but 3D stacking will become essential as the 
number of qubits used approaches the hun- 
dreds, or to construct arrangements that have 
topologies not achievable in two dimensions. 
Kumar et al. have extended their previously 
reported approach’ to assemble cold clouds of 
caesium atoms into a 3D lattice. The method 
begins with a randomly populated optical lat- 
tice: a trap formed from the interference pat- 
terns of counter-propagating lasers, in which 
atoms can be confined much like eggs in car- 
tons. After imaging and recording the random 
locations of atoms in the lattice, the authors 
implement a sorting protocol that involves 
intricately controlling the polarizations of the 
lattice lasers, while using additional ‘address- 
ing’ lasers and microwaves to position any 
given atom within a5 x55 array of lattice 
sites (Fig. 1). In this way, up to 50 neutral atoms 
can be precisely ordered into an array that is 
suitable for use in a quantum computer. 
Kumar et al. frame their sorting and prepara- 
tion protocol in terms of Maxwell’s demon. This 
thought experiment was proposed by James 
Clerk Maxwell in 1867, and explores the nature 
of entropy, a measure of disorder. Maxwell 
postulated that a reversible sorting mechanism 
(a sentient demon, although a non-sentient 
process would also work) could partition gas 
molecules into two sub-volumes. But this sort- 
ing process would lower the entropy of the 
gas in apparent violation of the second law of 
thermodynamics, which states that the entropy 
of any isolated system can only increase. How 
can this conundrum be explained? The answer 
turns out to be that the act of sorting inevitably 
increases the entropy of the Universe. Because 
the dominant entropy in Kumar and colleagues’ 
experiments is associated with the physi- 
cal arrangement of the atoms, their work is a 


realization of an omniscient Maxwell’s demon, 
summoned to organize the initial arrangement 
of a qubit array. 

Meanwhile, Barredo et al. extend their previ- 
ously reported method” for 2D atom sorting to 
three dimensions. Their approach to disorder 
and sorting is different from Kumar and col- 
leagues method, but just as effective. They use a 
holographic technique whereby a laser beam is 
reflected offa spatial light modulator and then 
focused to form traps known as optical twee- 
zers. In this way, they generate arrays of traps 
in arbitrary configurations that can be loaded 
with up to 72 cold rubidium atoms. To remove 
disorder and build the desired atomic configu- 
ration, the authors use a separate, movable opti- 
cal tweezer to pluck atoms from ‘wrong’ traps 
and either move them to correct sites or discard 
them (Fig. 2). This allows them to build qubit 
arrays in standard grid patterns, in topologies 
such as a Mobius strip, and even in the shape of 
the Eiffel Tower (see Fig. 2 of the paper’). 

Barredo and colleagues go on to engineer 
an interaction between two qubits in a sorted 
array. To do this, they excite the atoms into 
‘Rydberg’ states, which produce atomic elec- 
trical dipoles that allow the qubits to sense each 
other through dipole-dipole interactions. By 
contrast, atoms in their ground states have 
vanishingly small dipole-dipole interactions. 
Rydberg interactions have previously been 
used to enable quantum-logic operations 
carried out by small systems of neutral-atom 
qubits**, and could form the basis of both the 
current groups future efforts to develop quan- 
tum computers. 

The two papers report similar milestones 
for the assembly of neutral-atom quantum 
computers, with Barredo and colleagues also 
reporting a working two-qubit interaction. 
However, the atoms in Barredo and col- 
leagues’ system are not as cold as they could 
be, which means that the entropy remaining 
in their arrays is substantially greater than in 
Kumar and colleagues’ system. The result- 
ing micrometre-scale motion of the atoms 
within the traps could limit the performance 
of future devices based on this system — a 
restriction that does not apply to Kumar 
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Ordered atoms 


and part-filled them with ultracold rubidium atoms, which initially reside 

at random positions. The authors then used a movable optical tweezer 

(not shown) to grab atoms at ‘incorrect’ positions and deposit them at desired 
positions, to produce precise arrangements of atoms. 


and co-workers’ apparatus. But Barredo and 
colleagues’ approach does allow qubit arrays of 
any spatial design to be made, whereas Kumar 
and co-workers’ apparatus generates only a 
cubic lattice. These differences might not be 
important for near-term quantum-computing 
goals, however. It remains to be seen whether 
quantum entanglement (a phenomenon that 
produces stronger correlations between parti- 
cles than those permitted by classical physics, 
and which fuels quantum-computing algo- 
rithms) can be created for such large numbers 
of working qubits. 

Both papers report technical tours de force, 
and showcase how far neutral-atom systems 
have come in terms of stability, reproduc- 
ibility and technical sophistication. The next 
step is probably to generate quantum entan- 
glement between arbitrary pairs of atoms in 
sorted arrays. It will also be interesting to see 
which exotic quantum states can be simulated 
using these qubit arrays, especially if some of 
those states cannot be modelled using exist- 
ing computational approaches". Finally, it 
will be exciting to see whether the potential 
advantages of neutral atoms will now begin to 
pay dividends in the race to develop a working 
quantum computer. m 
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Facing up to the global challenges of 


ageing 


Linda Partridge!*, Joris Deelen!? & P. Eline Slagboom!?* 


Longer human lives have led to a global burden of late-life disease. However, some older people experience little ill health, 
a trait that should be extended to the general population. Interventions into lifestyle, including increased exercise and 
reduction in food intake and obesity, can help to maintain healthspan. Altered gut microbiota, removal of senescent 
cells, blood factors obtained from young individuals and drugs can all improve late-life health in animals. Application to 
humans will require better biomarkers of disease risk and responses to interventions, closer alignment of work in animals 
and humans, and increased use of electronic health records, biobank resources and cohort studies. 


doubled in most developed countries’ (Fig. 1). Better quality of 

water, food, hygiene, housing and lifestyle, immunization against 
infectious disease, antibiotics and improved medical care first reduced 
mortality in early life”? and, after about 1950, in people of 70 years of 
age or older’, Whether there will be a limit to human life expectancy is 
vigorously debated’, but survival rates in the elderly and mean life expec- 
tancy are generally projected to continue to increase®. In parallel with 
longer lives, most aspects of age-specific health have also improved, with 
increases in both physical and cognitive functioning during ageing in 
successive birth cohorts”*. 

Recent increases in human life expectancy have been much too rapid 
for genetic change to have had more than a minor role’. In contempo- 
rary populations, individuals who survive to great ages are particularly 
common in the so-called ‘blue zones’ of the world, Okinawa in Japan, 
part of Sardinia in Italy, Ikaria in Greece, Nicoya in Costa Rica and Loma 
Linda in the United States. These populations have not been found to be 
genetically distinct from their neighbours, and environment and lifestyle, 
including social networks, seem to have important roles in the healthy age- 
ing of these people!®. Factors such as diet, education and physical activity 
throughout postnatal life have a cumulative effect on mortality", and con- 
ditions during early life and parental health also have a large influence’*. 

Improved health of people of all ages, including older people, and the 
consequent increase in life expectancy, are to be celebrated as achieve- 
ments of civilization. However, healthy, disease-free lifespan (healthspan) 
has not increased as much as lifespan’. A global increase of five years in 
total life expectancy between 2000 and 2015 has been accompanied by 
only 4.6 years of healthy life expectancy (see http://apps.who.int/gho/ 
data/view.main.SDG2016LEXREGvélang=en). An average 16-20% of life 
is now spent in late-life morbidity’? , longer in females than in males, and 
in individuals with a lower socio-economic status or obesity'*"!°. Most 
of us now live far longer than in our evolutionary past, to ages that have 
not been shaped by natural selection. Advancing adult age is therefore the 
major risk factor for chronic killer diseases, including cancer and cardi- 
ovascular and neurodegenerative diseases’® (Fig. 2). The burden of these 
conditions is now falling mainly on older people. Ageing impairs sensory, 
motor and cognitive function, and thus lowers quality of life. Reduction 
in the length and severity of late-life morbidity should therefore be a 
major aim in civilized societies in the future. We shall refer to this goal as 
‘compression of morbidity. 


D uring the last 200 years, average human life expectancy has 


Compression of morbidity should be achievable. First, individuals 
who survive to over 100, 105 or 110 years show progressively greater 
compression of late-life morbidity'”!*. Therefore, a relatively healthy 
end to life is physiologically feasible and, if we could find the underly- 
ing mechanisms, it might be possible to extend the trait to the general 
population. Second, experimental work with laboratory animals, mainly 
yeast, nematode worms, fruitflies and mice, has revealed the remarka- 
ble malleability of ageing. Genetic, environmental and pharmacological 
interventions can extend lifespan, ameliorate the loss of function and 
diseases of ageing and, in some cases, compress late-life morbidity!?'. 
Although laboratory animals do not live as long as humans, ageing has 
underlying mechanisms that are conserved over long evolutionary dis- 
tances, and these provide potential targets to maintain human health at 
older ages”. Indeed, similar life-extending interventions are effective in 
different laboratory species’?”!. 

Here, we address the opportunities and challenges for discovering 
the genetic and environmental determinants of human lifespan and 
healthspan, and in translating results of discoveries in animals into health 
improvements for ageing humans. We will not to be able to abolish ageing, 
but we do expect to be able to attenuate the process and greatly ameliorate 
its effects. 


Genetics of human lifespan and healthspan 

Genetic analysis of the marked individual variation in human lifespan 
could identify potential targets for intervention, and several approaches 
have been used (Box 1 and Table 1). Twin studies have suggested that 
human lifespan is around 25% heritable?>, indicating that there is a 
large, and possibly modifiable, effect of environmental factors on 
lifespan. A recent study” in a population of millions of individuals, 
using the population pedigree, showed an even lower heritability of 
only 12%. The variation in these figures is probably due to the diffi- 
culty of accurately estimating common environmental and behavioural 
effects within families. The heritability of lifespan is minimal for par- 
ents who die between puberty and the age of 60, and then increases 
progressively with death at later ages”°. Different measures, including 
overall lifespan, healthspan and survival to exceptionally old ages (often 
termed longevity), have been used in genetic studies. Multiple genome- 
wide association studies (GWAS) of longevity have been performed, 
and the only genetic locus to show robust, genome-wide significance 
across studies is apolipoprotein E (APOE)”*, a cholesterol carrier in 
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Fig. 1 | Cumulative survival and age-specific death rates in the 
Netherlands in 1850, 1900 and 1950. a—-d, Cumulative survival (a, b) and 
mortality rates (c, d) in men (a, c) and women (b, d) based on 100,000 
individuals per birth cohort (1850 (red), 1900 (blue) and 1950 (green)) 
from life tables from the Netherlands. c, d, Note that the y axis is a log 
scale. 


peripheral tissues and the brain that is also associated with the suscep- 
tibility to cardiovascular and Alzheimer’s disease”’ (Box 1). The general 
lack of replication of findings in independent studies may be attribut- 
able to different measures of survival and health, the age specificity of 
genetic effects*®, and different allele frequencies of lifespan-associated 
genetic variants in different birth cohorts”’. 

Survival to advanced ages, particularly the 1-10% longest lived of 
the generation, is enriched in families”*, and members of these fam- 
ilies show a lifelong survival advantage, with lower risk of coronary 
artery disease, cancer and type 2 diabetes*°-* and better immune and 
metabolic health in middle and old age**-** compared to the general 
population and even to their spouses. However, the effects of com- 
mon, non-genetic influences in early life in these families cannot be 
ruled out. Neither familial nor sporadic long-lived individuals*>** dis- 
play a decreased load of common genetic risk variants for age-related 
disease***’, However, any common protective genetic variant that is 
responsible for familial longevity has yet to be found. Replicated studies 
based on candidate genes emerging from studies in model organisms 
can also be informative***?, and these have identified the FOXO3A”° 
locus, which encodes a transcription factor, the homologues of which, 
in model organisms, have a consistent role in healthy ageing’?*?”. 
Future genetic studies of longevity could focus both on establishing 
the longevity phenotype in multi-generational families, with whole- 
genome or exome sequencing of targeted cases, and on the multi- 
morbidity phenotype itself, rather than on proxies such as healthspan. 


Phenotypes and mechanisms of human ageing 

Human mortality rates reach a minimum around puberty and increase 
roughly exponentially thereafter (Fig. 1). Initially, the ageing process 
is manifested sub-clinically, in various types of physiological deterio- 
ration. From the third decade onwards, age-related changes in body 
composition occur, including loss of bone, cartilage, muscle mass 
and strength and gain of abdominal fat**4. Subsequently, systemic 
changes occur, for instance, in the endocrine system, resulting in altered 
hormone levels, and in the circulation, resulting in changes in blood 
pressure and blood lipids. The responses of tissues to hormones can 
also be affected, as in insulin resistance*’. Mechanical and structural 
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changes also occur, including vascular stiffness, which can affect heart 
and brain functions“*. Eventually, these continuous sub-clinical changes 
can culminate in a range of medically defined disease conditions in 
middle age, with the co-existence of two or more chronic health con- 
ditions in an individual being defined as multimorbidity. People with 
higher levels of markers of disease risk in their blood (Box 2), and 
those with multimorbidity, die up to 20 years younger than those with 
lower levels*”*8. Late ages are frequently accompanied by frailty”, a 
composite index of ill health, functional and psychosocial deficits*°, 
which increases the risk of falls, fractures, hospitalization, organ failure, 
disability and death?! (Fig. 3). 

The largest medical challenges in treating the growing number of 
elderly patients are multimorbidity, present in at least half of the elderly 
over 70 years*”>* and the related use of five or more types of medica- 
tion (polypharmacy), which occurs in over 10% of the general popula- 
tion®?*4 and 30% of the elderly°>. Up to 12% of all hospital admissions 
of older patients can be attributed to adverse drug reactions®®*’, which 
most commonly involve anticoagulants, blood pressure lowering and 
hypoglycaemic drugs, antiplatelet agents (aspirin) and nonsteroidal 
anti-inflammatory drugs”; the latter two contribute most frequently to 
death after admission. Behavioural risk factors also have a major role. 
Large multi-cohort studies in high-income countries have indicated 
that the number of years lost because of smoking, physical inactivity 
and high alcohol intake (more than 21 units per week for men, more 
than 14 per week for women) are, on average, 4.8, 2.4 and 0.5 years, 
respectively’. Sedentary behaviour is especially common among older 
people, who spend, on average, almost 10 waking hours in an immobile 
posture. The WHO (World Health Organization) is therefore tar- 
geting the major risk factors that have been identified so far, with the 
overall aim of reducing premature mortality from non-communicable 
diseases by 25% by 2025%'. 

Further progress in preventing late-life ill health will come from 
better predictors of its occurrence, and from understanding how to 
intervene to block the causal mechanisms at an early stage. Different 
measures (biomarkers) can indicate the aetiology of ageing and its 
progress to disease states (Box 2). Physiological decline can partly be 
measured by standardized analyses of physical, respiratory and cog- 
nitive capacity, blood pressure and circulatory markers. Poor scores 
for these indicators during midlife are associated with an increased 
risk of morbidity and mortality over time!'. These markers can also 
monitor health improvement in response to interventions, but do not 
yet robustly reflect all relevant aspects of ageing. Generating compre- 
hensive biomarker profiles that are capable of doing so is therefore 
important, and this field is progressing rapidly (Box 2). 

Preventative interventions into lifestyle aimed at slowing specific 
effects of ageing have presented a complex picture, with outcomes 
varying with the type of intervention, the age of the subjects and the 
population from which they are drawn. Some intervention regimes 
have been successful. For instance, treating adults at risk of diabetes by 
altering their diet, increasing their physical activity or both can be as 
effective as medication, and with better continued benefit, even for up 
to 15 years®’. Reductions in hypertension, diabetes and brain atrophy, 
improved cognitive performance and reduction in mortality due to 
cancer and cardiovascular disease, have all been achieved by alterations 
to lifestyle. Specific diets**™, exercise®, the two combined, cognitive 
training and vascular risk management”, caloric restriction®’, inter- 
mittent fasting® and supplementation of vitamin D”° have all been 
reported to be effective for specific conditions. However, the response 
to these interventions can show marked individual variation. It may 
become possible to target these interventions to those individuals who 
will benefit the most when robust biomarkers of the variation become 
available. For instance, older and more frail people could benefit from 
more dietary protein to combat traits such as muscle wasting and weak- 
ness (sarcopenia)’!, whereas middle-aged people may benefit from less 
protein to combat cancer, although more direct evidence is needed, 
both from experimental work in animals and epidemiological studies 
in humans. 
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Fig. 2 | Disability-adjusted life years for age-related diseases in three 
global regions. a—c, Disability-adjusted life years (DALYs) per 1,000 
individuals are shown for men (left) and women (right). DALYs are shown 
for malignant neoplasms (red), diabetes mellitus (blue), Alzheimer’s 
disease and other dementias (green) and cardiovascular diseases (purple) 


Lifestyle interventions, while often beneficial, can be insufficient to 
prevent the progress of age-related problems, partly because of fail- 
ures in compliance, and also because of limited and variable responses. 
Drugs are an additional option, and are already in widespread use for 
the prevention of cardiovascular disease, by pharmacologically decreas- 
ing hypertension”” and low-density lipoprotein cholesterol”*”* in 
healthy individuals who are at risk of cardiovascular disease (primary 
prevention). Treatment of the elderly is complex, since the relation 
between cardiovascular risk indicators, such as high body mass index, 
blood pressure and blood lipids, and end points, such as mortality, 
can change and even reverse with increasing age. The changing 
correlation with age could indicate that pharmacological interventions 
should depend on age’® and the presence of frailty and multimorbid- 
ity’°. However, mortality may be selective, with those sensitive to clas- 
sical risk factors dying before the age of 70, or reverse causation may 
occur, with age-related diseases leading to low body mass index and 
blood pressure, and further work teasing out causality is needed. The 
ageing process in animals shows evolutionarily conserved, parallel and 
interacting mechanisms, known as hallmarks”, that have proven to be 
modifiable, and several of these are also well-documented in humans 
(Table 2). They eventually lead to unrepaired damage in DNA”, accu- 
mulation of misfolded and aggregated proteins (for example, in the 
brain and the retina) and senescent cells (for example, in joints and kid- 
neys)”* as well as to an inappropriate and persistent activation of stress 
responses”, such as in the innate immune system (inflammaging*). 
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in different areas of the world for the year 2015 according to data from the 
WHO!®. a, Europe. b, Africa. c, Western Pacific. One DALY represents 
the loss of one year of health due to mortality or disability caused by the 
indicated disease. 


To develop further interventions to compress morbidity, including 
drugs, we need a better understanding of the roles of individual age- 
ing mechanisms in different tissues and at different stages in life, and 
their contributions to the aetiology of age-related diseases. To this end, 
animal studies are useful to inform more targeted studies in humans. 


Translating discoveries from animal ageing 
The model organisms that are commonly used in ageing research have 
a much shorter lifespan than humans. However, they recapitulate many 
features of human ageing*’. Furthermore, similar to many humans, 
their culture and care regimes in the laboratory mostly protect them 
from infectious diseases, provide them with abundant high-quality 
food, restrict their exercise and remove many physical challenges. As 
a consequence they, too, live to much greater ages than in their evolu- 
tionary past. Conservation of mechanisms of ageing between animals 
and humans extends to both the hallmarks” of ageing and the genes 
that are involved in ageing and age-related diseases'®””. Different model 
organisms best recapitulate specific aspects of human age-related prob- 
lems. Work across different model organisms has yielded biomarkers 
(Box 2) that predict remaining lifespan, such as nucleolar volume and 
telomere length®***, and these are promising candidates for inclusion 
in a multivariate predictor of the rate of human ageing. 

Ageing in animals has proved to be highly malleable in response 
to environmental and genetic interventions. Various regimes of die- 
tary restriction are particularly effective, with increased lifespan and 
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Box | 
Genetic studies into the variation 
in human lifespan 


Studies aiming to identify genes that influence human lifespan 
initially explored candidate genes related to human age-related 
diseases or to amelioration of ageing in laboratory animals. After 
gene array-based technologies became available, genetic variation 
in the whole genome was explored in linkage studies of longevity 
amilies or GWAS of population-based older cases and younger 
controls. The selection of cases for these studies was based on 
ongevity threshold criteria, such as survival of individuals to ages 
above 90 or 100 years or individuals (or their parents) belonging 
o the top 10% or 1% of survivors in a population !41.142145-147, 
or a continuous lifespan parameter, such as age at death of 
individuals or their parents!43144166, The candidate gene studies 
revealed only two loci that have been consistently replicated in 
independent studies. The first is APOE, which is also the only 
genetic locus that shows robust, genome-wide significance across 
different GWAS (Table 1). The genetic variant responsible for ApoE 
e2 (rs7412) has been shown to be protective and the one for 
ApoE <4 (rs429358) has been shown to be deleterious!®’. The 
second is FOXO3A*!? (see LongevityMap!® for an overview of 
results from all candidate gene studies). Genomic locations that 
were identified in linkage studies showed no overlap between the 
different studies*©. The GWAS analyses, on the other hand, have 
thus far identified several genetic variants (Table 1). However, the 
majority of these are disease-related variants that influence early 
mortality, rather than survival to extremely old ages. One of the 
loci that was observed only in a single, large, parental longevity 
study was also found to be associated with a diversity of age- 
related diseases!®?, and contained CDKN2A and CDKN2B, which 
are involved in the development of cellular senescence, a hallmark 
of ageing. Individuals from long-lived families may have genetic 
variants that are rarer, which can only be identified by (whole- 
genome) sequencing of family members. This approach has 
already resulted in the identification of functional genetic variants 
in the IGF-1 receptor (IGFIR)3®179, 


a remarkably broad improvement in health during ageing in diverse 
species, including rodents””***°, Two studies®**” of dietary restriction 
in rhesus monkeys had slightly different outcomes, probably because of 
differences in the composition of the control diet, the degree of restric- 
tion and the timing of food provision®. Lifespan increased with dietary 
restriction in one study, while in both there were major improvements 
in health in food-restricted animals, with reduced plasma triglycerides, 
diabetes, cardiovascular disease, sarcopenia, incidence of neoplasms 
and brain atrophy, which are the most relevant health parameters in 
ageing humans. Multiple genetic interventions can also induce broad 
improvements in health in laboratory animals®®. For example, reduced 
activity of the insulin-insulin-like growth factor (IGF) signalling 
(1IS)-mammalian target of rapamycin (mTOR) signalling network can 
extend lifespan in yeast, worms, fruitflies and mice?!, and genetic vari- 
ants in candidate orthologous genes, or their gene expression patterns, 
in humans can be associated with survival to advanced ages*!429°->?, 
As for dietary protein, any benefits of modulating the activity of the 
IIS-mTOR signalling network in humans may depend on age’!. The 
network detects nutrition and stresses, and matches costly activities 
such as growth, metabolism and reproduction to current physiologi- 
cal state. Systems mediating major life history choices in response to 
environmental cues thus have an important role in ageing. Despite 
the complexity of the ageing process, with multiple hallmarks and 
interactions between them, its effects can clearly be ameliorated in 
animals. Notably, these interventions have also proven to be capable of 
combatting the pathology in models of age-related diseases, including 
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cancer, neurodegeneration and cardiovascular diseases”>-*°. It remains 
to be seen whether human ageing will show a similar degree of 
adaptability in response to interventions that have proven to be effective 
in animals, but a major conclusion from ageing in model organisms 
is that delaying, or even preventing, age-related diseases is a real- 
istic prospect. Furthermore, while no intervention studied so far 
has improved all aspects of health®®, interventions that extend 
lifespan generally also prevent more than one age-related condition 
simultaneously” (Fig. 4). 

Anti-ageing interventions that have proven to be effective in labora- 
tory animals are starting to be assessed for their feasibility, effectiveness 
and safety in humans. Although dietary restriction increases predic- 
tors of healthy ageing in human volunteers, it is not a realistic public 
health intervention, because of poor compliance with even mild (90%) 
dietary restriction regimes’. However, more modest dietary interven- 
tions may be realistic. For example, in mice modulation of the protein 
content of the diet can both prevent over-consumption of protein-poor 
diets and avoid the increased risk of cancer from high-protein diets”»*®. 
The amino acid content of protein also determines its value to the ani- 
mal and can be modulated to reduce total protein consumption”. 
Timing of food intake can also be important’™”!”’. Food-restricted 
mice and rats are usually fed once a day and consume their limited 
food ration as soon as it is supplied, with a protracted fasting period 
until the next day. This fasting period may be at least as important as 
reduced food intake in promoting healthy ageing”®°*°. Indeed, trials 
in middle-aged humans with a fasting-mimicking diet, low in protein, 
carbohydrate and calories, but high in unsaturated fats, have shown 
beneficial effects on biomarkers of health, such as blood pressure and 
levels of circulating IGF-1, with particularly strong effects in individu- 
als who are most at risk of disease’. The time of day at which food is 
consumed can also have substantial metabolic effects!°°. These more 
nuanced interventions need further exploration in both animal and 
human studies, particularly with regards to the age specificity of their 
effects and possible adverse effects at later ages. 

Increasing attention is focusing on pharmacological manipulation of 
the mechanisms of ageing in animals, with a view to direct translation 
to humans to prevent age-related diseases. Development of new drugs 
to ameliorate human ageing would pose challenges for clinical trials 
since, in the absence of validated biomarkers of risk, a large, random, 
initially healthy population would have to be treated over a long period. 
At present, repurposing of existing drugs with a good safety profile 
is therefore a more realistic short-term prospect than de novo drug 
development’. Because the mechanisms of ageing that have been dis- 
covered in animals are proving to be important for human age-related 
diseases, many are already targets of drugs that are licensed to treat 
these diseases'°”'8, There is an opportunity to widen the use of exist- 
ing drugs that are used to treat single, age-related diseases to prevent 
multimorbidity. For instance, the licensed drug sirolimus (also known 
as rapamycin) inhibits mTOR complex 1, part of the nutrient- and 
stress-sensing network, and can extend the lifespan of model organ- 
isms, including mice, in which it improves many, but not all, aspects 
of health during ageing and protects against cancer”*!. As in elderly 
mice, the poor immune response of elderly humans to immunization 
against influenza can be enhanced by pretreatment with the related 
drug everolimus'!°. The anti-diabetic drugs metformin and acarbose 
can also extend lifespan in laboratory animals, and are currently reg- 
istered for clinical trials against ageing itself, which has not previously 
been recognized as a valid target'*®!!!"1!9. The doses of drugs that are 
effective for the prevention of the effects of ageing in animals are often 
lower than those used clinically, so that side effects may be reduced, 
and may be further prevented by making drugs, such as rapamycin, 
more specific to their therapeutic target'!* and by adjusting dosing 
regimes!!4115, 

Other recent discoveries about animal ageing are showing promise 
for translation to humans. Cellular senescence, a hallmark of ageing 
in both laboratory vertebrates and humans (Table 2), is a permanent 
type of cell cycle arrest and is associated with resistance to cell death 
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Table 1 | Loci emerging from GWAS of discrete and continuous lifespan-related phenotypes in human studies 


REVIEW 


Replication 
Between Associations with 
Closest gene(s) Discrete phenotypes Continuous phenotypes Within publication publications age-related diseases 
APOEM1-145 Age >99th percentile;age >90 years; Parentallifespan;age Yes Yes Multiple 
age >100 years; parental age >90th attained by parents 
percentile 
CHRNA3 and CHRNA5143:144 Parental age >90th percentile Parental lifespan; age Yes 0 Cancer 
attained by parents 
LPA143,144 Parental age >90th percentile Parental lifespan; age Yes 0 Multiple 
attained by parents 
CDKN2A and CDKN2B“3 Parental age >90th percentile Parental lifespan; age Yes lo) ultiple 
attained by parents 
USP42141 Age >99th percentile one Yes No None 
TMTc2"41 Age >99th percentile one Yes No None 
IL6145 Age >100 years one No No nflammatory 
ANKRD20A9P145 Age >100 years one No No None 
LINC02227142 Age >90 years one Yes No Cardiovascular 
FOXO3A146 Age >90 years one Yes No one 
RAD50 and IL 13147 Age >90 years one Yes No one 
MC2R143 Parental age >90th percentile one Yes fo) one 
USP2-AS1*43 Parental age >90th percentile one Yes 0 None 
HLA-DQA1 and HLA-DRB1143:144 None Parental lifespan; age Yes No nflammatory 
attained by parents 
ATXN2143 None Age attained by parents No No Multiple 
FURIN*483 None Age attained by parents No No Cardiovascular 
EPHX2143 None Age attained by parents No No Cancer 
PROX2143 None Age attained by parents No No None 
CELSR2 and PSRC1143 None Age attained by parents No No Cardiovascular 
We included only studies that showed one or more genome-wide significant associations with lifespan-related phenotypes (P <5 x 10-8), with the exception of the RAD50 and /L13 locus 
(P=5.42 x 10°’), which was based on the number of linkage disequilibrium-independent markers on the genotyping array (Immunochip) used in the study!4”. We excluded studies that were based 
on results from cohorts that were also included in more recent and larger studies. ‘Within publication’ refers to replication of a locus in different cohorts within the same publication. ‘Between 


publications’ refers to replication of a locus in different cohorts from different publications. 


and secretion of bioactive molecules, the senescence-associated secre- 
tory phenotype (SASP). Cellular senescence is important during both 
development!!° and wound healing", where it has a key role in tissue 
remodelling, but in these contexts the senescent cells are eventually 
removed by macrophages. During ageing, senescent cells persist. Their 
presence can cause tissue damage, and they are implicated in the aeti- 
ology of human age-related diseases, including atherosclerosis, oste- 
oarthritis and cancer7®!!8"!9, Selective removal of senescent cells, or 
disruption of the SASP, can restore tissue homeostasis and increase 
healthspan and lifespan in mice’!®"!?!. Although more work in ani- 
mals will be needed to assess the long-term effects and side effects 
of this type of intervention, research is already directed towards the 
possibility of improving the quality of tissues for transplantation, such 
as kidneys, by prior removal of senescent cells'”* and clinical trials 
are underway for the treatment of osteoarthritis and glaucoma. A 
promising approach that has emerged from work on animals is epige- 
netic reprogramming of aged cells to rejuvenate tissues!*3, which has 
extended lifespan in a mouse model of premature ageing’. The myriad 
of microorganisms present in the gut, the ‘microbiome is increasingly 
implicated in the health of the gut itself and of other organs during 
ageing’’>!°, Although most work thus far has been descriptive rather 
than experimental!”’, transfer of the microbiome from young to mid- 
dle-aged turquoise killifish resulted in an increase in lifespan and a 
delay in behavioural decline relative to fish that received a transfer from 
middle-aged fish'**. The composition of the human gut microbiome 
shows marked individual variation and is sensitive to many environ- 
mental factors, including habitual diet, medication and long-term res- 
idential care!”’. Faecal transplantation from lean donors to patients 
with metabolic syndromes can improve insulin sensitivity!?”!3° and 
probiotic studies in humans are underway, following positive results 


in mice and safety assessment in humans'*!. Further experimental 
studies in animals are needed to explore the role of the microbiome 
in ageing and age-related disease, and to use the findings to inform 
the design of trials in humans. The systemic, circulatory environment 
has also proved to play a key part in ageing. Experiments in which 
the blood systems of mice were conjoined (parabiosis) showed that 
impaired function of stem cells in multiple aged tissues could be slowed 
or even reversed!*”. Transfer of blood or plasma, and of plasma pro- 
teins, from human umbilical cords has recently been shown to rejuve- 
nate hippocampal function in old mice’*”, suggesting that there may 
be evolutionary conservation of the effector molecules between mice 
and humans. Identification of these is a high priority for research. The 
practical accessibility of both the human microbiome and blood system 
makes therapeutic manipulation a particularly attractive approach, but 
research in animals is needed to establish the long-term consequences 
and possible side effects. 


Integrating research in animals and humans 

The increasing pace of discovery of the mechanisms of ageing in 
animals, burgeoning practical efforts to characterize and predict the 
phenotypes of human ageing, together with the recent appearance of 
databases of electronic health records’**, biobanks and more focused 
long-term cohort studies, are opening new opportunities to discover 
the mechanisms that underlie the diversity in physiological deteriora- 
tion, multimorbidity and frailty and to intervene so we can attenuate or 
prevent these age-related problems. Further progress will be facilitated 
by collaboration between scientists who work in different fields. This 
will align efforts to test the effects of feasible interventions in humans 
and animal models on ageing biomarkers, hallmarks, multimorbidity 
and frailty at the individual level. Direct and standardized measures of 
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Table 2 | Hallmarks of ageing investigated in human studies 


Hallmark 


Genomic instability2277148 


Telomere attrition22148.149 


Epigenetic alterations22148.150-152 


Loss of proteostasis22148.153 


Deregulated nutrient 
sensing2295148,154,155 


Mitochondrial 
dysfunction22148.156-159 


Continued 


Description 


Accumulation of genetic 
damage affects DNA 
integrity and stability 


Chromosome caps formed 
by repeated DNA 
shortening of telomeric 
DNA and DNA damage to 
telomeres 


Change in DNA 
methylation, non-coding 
RNA, histone modification 
and transcription 


Affects protein folding, 
degradation and repair 
by ubiquitin proteasome 
and lysosome autophagy 
affects synthesis of 
chaperones 


Detect concentrations of 
intra/extracellular nutrients 
(glucose, amino acids, AMP, 
NAD*) by insulin IGF-1 
(IS), expression of MTOR 
signalling-induced FOXO 
transcription factors AMPK 
and sirtuin 


Decreased numbers with 
age compromises 
mitochondrial function 
upon energy demand, 
accumulation of reactive 
oxygen species, lipid 
peroxidation, impaired 
clearance of dysfunctional 
mitochondria (mitophagy) 
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Change with age/health 


Beneficial response 


(observational studies) 


Accumulation of somatic 
nuclear and mitochondrial 
mutations 


Pathogenesis of cancer 
and progeroid syndromes 


Shortening of telomeres 
in cells and tissues 

owing to cell division and 
damage associated with 
organ failure, disease and 
mortality 


Accelerated by smoking 
and obesity 


Global demethylation 
and, at promotor regions, 
hypermethylation and 
increased variation at 
polycomb target regions 
in multiple tissues 


Induced by environmental 
effects (smoking, stress, 
trauma, alcohol and sun) 


Epigenetic clocks associate 
with health and disease in 
prospective studies 


Misfolded and aggregated 
proteins (in cataracts) and 
accumulation of 
autophagic vesicles in 
affected neurons in 
neurodegenerative 
disease 


Accelerated by obesity 


Autophagy is better 
maintained in cente 
narians and their families 


Increased gene expres- 
sion of MTOR and IIS 
pathways with age in 
different tissues and with 
severity of brain disease 


Serum IGF-1 decreases 
with age and is associated 
with sarcopenia 


IIS gene variants are 
associated with long life 
(85+) 


Accumulation of reactive 
oxygen species and 
somatic mutations in 
mitochondrial DNA, clonal 
expansion and mosaic 
respiratory chain 
deficiency in multiple 
tissues 


Decreased synthesis of 
mitochondrial proteins in 
muscle 


Diversity of cancers, 
chronic obstructive 
pulmonary disease 
(respiratory disease), 
atherosclerosis and 
hypertension 


(intervention studies) 


Dietary energy restriction and 
increased exercise reduce 
oxidative and DNA damage, zinc 
de/repletion and DNA single- 
stranded breaks 


Mediterranean and plant-based 
diet and anti-oxidant 
supplementation slow down 
telomere shortening, CVD and 
mortality 


Bariatric surgery 


Folate and polyphenol 
supplementation 


Dietary energy restriction 


Dietary energy restriction 


(Intermittent) fasting 


Low protein intake in a cohort 
study and dietary restriction were 
associated with low serum IGF-1, 
but not in all studies, and restored 
insulin sensitivity 


Dietary restriction stimulates fatty 
acid oxidation and lowers 
oxidative damage 


High-intensity aerobic interval 
training in young and old 
improved cardiorespiratory 
fitness, muscle mass, protein 
abundance and insulin sensitivity 


Resveratrol supplementation 
tested for protection of lungs, 
cardiovascular and respiratory 
pathways; inconclusive owing to 
variability in studies and doses 


Causal evidence 


Mutations cause 
premature ageing 
syndromes 


Mutations in telomerase 
cause familiar premature 
disease, pulmonary 
fibrosis, dyskeratosis 
congenita and aplastic 
anaemia 


Loss of regenerative 
capacity 


Mutations in PS1 and PS2 
cause familial autosomal- 
dominant Alzheimer’s 
disease and result in 
amyloid deposition, 
neuronal loss and 
lysosome pathology 


Mutations lowering growth 
hormone and IGF-1 lower 
the incidence of cancer 
and CVD 


Mitochondrial mutations 
cause diseases with 
multiple ageing 
symptoms 


REVIEW 


Hallmark 


Cellular senescence2278.148,160 


Description 


Arrest of cell cycle 
excretion of proteins 
(SASP) adversely impacts 
tissues and affects 
clearance by 


Change with age/health 


Beneficial response 


(observational studies) 


Accumulation with age 
in a variety of tissues 
preceding disease, but 
controversies exists 
whether accumulation 


(intervention studies) 


Senotherapy (clearance of senes- 
cent cells) in human cell models in 
which senescence is induced 


Causal evidence 


Germ line and somatic 
mutations in CDKN2A 
contributes to increased 
risk of range of cancers 


inflammasome 


Accumulation in 
pathology (lung, kidney 
and cartilage), in biopsies 
and after therapeutic 


damage 


Association of genetic 
variation at the CDKN2A- 


occurs in healthy 
individuals 


Prevention of accumulation of 
senescence by metformin in 
human cell models in which 
senescence is induced 


Senolytic drug treatment 
of human osteoarthritic 
cartilage explants and 
cultures: depletion of 
senescent cells, 
chondrocyte proliferation 
and growth of the 
extracellular matrix in 
cartilage 


Compounds inducing senescence 
tested in cancer cells 


CDKN2B locus and 
multiple metabolic 


diseases 


Decrease in the 
regenerative potential of 
stem cells 


Stem-cell exhaustion!6!-164 
fibrosis 


Observed in pulmonary 


Loss of satellite cells in 


Regenerative medicine on the 
basis of mesenchymal stem cells, 
musculoskeletal damage repair 


muscle and decreased 
regeneration capacity 


Increased frequency of 
haematopoietic stem cells 
with impaired functionality 
and clonal expansion; 
however, the health 
consequences of these 
impairments remain 


unclear 


Altered intercellular 
communication’2 


Deregulated endocrine, 
neuroendocrine, neuronal 
signalling associated with 
chronic inflammation 
during ageing and decline 
of adaptive immune 
system or other inter- 
organ coordination (such 
as by the gut microbiome) 
through blood-borne 
factors 


Chronic inflammation and 
composition of the gut 
microbiome 


Chronic overexpression of 
basal levels of stress- 
related proteins, such as 
heat-shock proteins in 
older patients, ER 
chaperones, hypoxia 
factor (HIF1«) 


Poor corresponding 


Gastric bypass 


Calorie restriction 


Resistance exercise training 


adaptive response to 


stress 


Hallmarks of ageing as formulated for animal studies with adapted criteria: (1) manifestation during normal ageing in cross-sectional (comparison of young and old donors) or longitudinal (repeated 
measurements over time) studies; (2) aggravation is associated with a pathological condition (accumulates in diseased tissue, prevalent in patients or prospectively predicts health deficit); 

(3) intervention studies beneficially change aggravation; (4) removal of age-related changes increases health conditions, or aggravation causes accelerated ageing. There is no systematic approach 
yet to record the hallmarks of ageing in human studies for any of the above criteria and especially repeated measurements in longitudinal studies are missing. Hallmarks may not completely cover 
all relevant observations in humans, such as the adaptive homeostatic response’®. Evidence for the causality of the hallmark in human ageing mostly results from mutations causing juvenile forms of 
age-related disease or ex vivo experimental data, mostly in cell models and sometimes in tissues. PS1 and PS2 are also known as PSEN1 and PSEN2, respectively. CVD, cardiovascular disease. 


end-life multimorbidity itself are needed, in both animals and humans. 
Measures of healthspan and of age-specific multimorbidity, although 
informative, do not directly assess the duration or extent of multimor- 
bidity at the end of life. Few such studies are conducted, because they 
necessitate longitudinal information on individuals until they die, but 
they will be necessary to assess the effects of interventions on the com- 
pression of morbidity. 

The results of research into ageing in animals and humans are 
producing major dividends. Global public health efforts to increase 
human healthspan will increasingly focus on lowering the risk of 
obesity, smoking, high alcohol intake, physical inactivity, hypertension 
and low-density lipoprotein cholesterol, and success in doing so should 
yield widespread reductions in diabetes, cardiovascular disease and 
cancer. Repurposed drugs are also a promising approach to maintain 
human health during ageing, and new clinical trials are underway with 
candidates that include mTOR inhibitors'!° and metformin’. Drugs 
that kill senescent cells (senolytics) or that block the SASP also show 


great promise to induce repair of damaged tissue. If successful, the trials 
for the treatment of osteoarthritis and glaucoma could be extended 
to primary prevention among at-risk, elderly people if a consensus 
can be reached on surrogate end points of cartilage degradation and 
retinopathy. Ideally, preventative drug treatment in humans would 
start later in life, to minimize the duration of possible side effects of 
long-term medication use. However, clinical trials do not, in general, 
include older people, and evidence that drugs are effective, at which 
doses and whether they have the expected profile of side effects among 
the elderly, is needed but is often lacking. For instance, levothyroxine, 
which is widely used to treat older adults with slightly underactive thy- 
roid function, has proven to be ineffective in older people'**. The mech- 
anisms leading to this lack of efficacy in late life could be investigated 
in laboratory animals, particularly to understand whether treatment 
is effective only if started in middle age or even earlier. Polypharmacy 
is a major problem in older people, and model organisms could be 
used to find ways of minimizing its effects. Therapies based on cellular 
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Box 2 

Biomarkers of the 

physiological state and biological 
age of individuals 


Biomarkers in human research are, on the one hand, used to 
detect individual variability in the progress of ageing, as risk 
indicators, and, on the other hand, for monitoring the response to 
interventions. Different biomarkers have been developed to answer 
different questions, for example, to monitor the physiological state 
of individuals, predict the onset and/or progression of age-related 
diseases, detect the physiological vulnerability of elderly to poor 
clinical outcomes or predict mortality. Biomarkers of the risk of 
age-related diseases have been developed with great success. No 
consensus has yet been reached on biomarkers of biological age, 
that is, the mismatch between chronological age and the stage of 
an individual along the ageing process. These biomarkers should 
ideally meet a number of criteria, such as those defined by the 
American Federation for Aging Research (AFAR): they should 
(1) mark the individual stage of ageing and predict mortality better 
than chronological age; (2) monitor ageing in a range of systems 
and not the effects of disease; and (3) allow longitudinal tracking 
(for example, by blood tests or imaging techniques) in animals and 
humans?!”1, 
Several types of biomarker of the physiological state include 
whole-system indicators of physical or mental capability (for 
example, locomotor function, strength, balance, cognition and 
activity during daily living), physiological reserve (for example, 
respiratory and cardiovascular function) and the systemic capacities 
to regulate lipid and glucose metabolism and immunity (for 
example, insulin, IL-6 and CRP)?3!72, In addition to single markers, 
multi-marker indicators have been generated based on assays 
of multi-organ functionality and/or molecular characteristics. 
Physiological vulnerability later in life, that is, ‘frailty’ at ages above 
80 years is generally described by low physical activity, muscle 
weakness, slowed performance, fatigue or poor endurance and 
unintentional weight loss. About 50 different frailty algorithms 
are available, the ‘frailty phenotype’!”3 and ‘frailty index’!”* being 
the most commonly used clinically. For early phases of life, other 
scores, such as the ‘Pace of Aging’ score*?, have been generated. 
More recently, multi-marker indicators of biological age have 
been based on age-related changes in the transcriptome!’®, 
epigenome!’®!77, metabolome?’8 and structural neuroimaging’? 
These await systematic testing and comparison with each other 
and with traditional parameters, in relation to clinical decisions 
and intervention studies. Different indicators of biological age 
(telomere shortening, epigenetic clocks and pace of ageing) seem 
to reflect different aspects of physiological decline!®°. Because 
long-lasting cohort studies contain many ageing phenotypes and 
a large amount of clinical, imaging and molecular data, collected 
at multiple time points, these studies could allow systematic 
comparisons and development of a multivariate mix of marker 
profiles with the strongest predictive power. 


reprogramming and systemic factors from young plasma also show 
great promise for application in tissue regeneration. 

For interventions into the ageing process to have maximum impact 
on ageing populations, they would ideally be effective as popula- 
tion-wide public-health measures. These would require an excellent 
safety profile and near-universal efficacy. However, the marked indi- 
vidual variation in the ageing process means that some interventions 
will be most effective when they are targeted at those people who are 
most at risk. When establishing risk of rapid physiological decline and 
age-related disease, and monitoring the response to interventions, 
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Fig. 3 | Schematic representation of the timing and progression of age- 
related phenotypes in adult humans. a, Age-related phenotypes include 
loss of bone and muscle mass, gain of abdominal fat, mechanical and 
structural tissue changes, age-related diseases and frailty. b, Each organ, 
tissue, cell or trait deteriorates over time at different rates in different 
people, resulting in individual trajectories of functional decline during 
ageing, graphically represented by the different coloured lines. The purple 
line could, for example, represent an individual who rapidly gained 
abdominal fat during adulthood, reaching a plateau in midlife, with loss 
of muscle mass and strength in mature adulthood, resulting in a rapid 
decline of functional capacity of the locomotor system and development 
of age-related disease, such as osteoarthritis, accompanied by falls and 
fractures. The dark blue line, on the other hand, could represent someone 
who remains metabolically healthy until late adulthood, after which he 

or she suffers from a decline in kidney function, which also affects the 
cardiovascular system and can result in heart disease, as well as suffering a 
decline in cognitive capacity and ultimately frailty. 


blood is the most practically accessible and therefore the most com- 
monly investigated tissue, but it is much less commonly used in ani- 
mal studies. It will be important to develop blood-based biomarkers 
of risk, ageing hallmarks and responses to candidate interventions in 
animals. Mice are commonly used in studies of ageing and age-related 
disease, but other mammalian species may be more suitable for work 
on specific conditions, such as rats for thyroid function and blood 
pressure. Most laboratory mice are also inbred, with marked strain 
peculiarities, and animals that are more outbred would more closely 
mirror the individual heterogeneity that is typical of human popula- 
tions, although this problem is not confined to work on ageing. Some 
promising new models are also appearing that allow for parallel cell 
biological studies of animal and human ageing. Direct reprogramming 
of primary fibroblasts from individuals of different ages can maintain 
age-specific transcriptional profiles and decreases in nucleocytoplasmic 
compartmentalization, potentially providing opportunities to study 
age-related cellular changes in vitro'**. Organoids can also provide a 
three-dimensional context for the study of interactions of different cell 
types with each other and with the extracellular matrix’°. These sys- 
tems will facilitate ex vivo work on human ageing with more realistic 
material than conventional tissue culture. 

Further understanding of human ageing is coming from analysis 
of, for example, electronic health records and biobanks and detailed 
genetic and phenotypic data from clinical and longitudinal cohort 
studies. These can capture those features of human ageing that are not 
recapitulated by laboratory animals. The patterns of age-related disease 
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Fig. 4 | Ageing is characterized by mechanistic hallmarks that 
contribute to ageing to different extents in different organisms, and in 
different cell types within an organism. Hallmarks can influence each 
other both within cells and at a distance. Different interventions to prevent 


and multimorbidity identified from these sources can then be tested for 
their association with genetic, molecular and other phenotypic charac- 
teristics. Expression of genes, proteins and metabolites associated with 
age-related diseases can provide more mechanistic insights, includ- 
ing the role of the hallmarks of ageing (Fig. 4). An initial correlation 
between, for example, increased levels of a protein and the incidence 
of a health condition can be investigated for causality by Mendelian 
randomization'?”!%8, in which the random assignment of genetic varia- 
tion to individuals at the zygote stage constitutes a natural experiment. 
Experimental studies in human cells, organoids and animals can then 
be used to analyse the mechanistic links between the protein and the 
condition. Data resources, such as the druggable genome’, can be 
used to determine whether the protein is a potential drug target of 
approved or novel drugs that could delay or prevent the condition. 
These approaches would benefit from standardized protocols to obtain 
biobank samples from older people at general practitioner and hospital 
visits, in order to obtain a more representative sample of the elderly 
population than available from current biobank and cohort studies. 
Novel assays using metabolic imaging now allow non-invasive record- 
ing of metabolic health status”. The accumulated longitudinal data 
and biological specimens that have already been collected in cohort 
studies can also be used to estimate the individual rate of change in 
specific biomarkers and outcomes. Robust biomarkers emerging from 
such systematic research can then be used as surrogate end points to 
indicate whether anti-ageing interventions are likely to have beneficial 
effects on clinical outcomes. 

The expanding proportion of unhealthy elderly people in many pop- 
ulations is indeed a global challenge to society. However, public health 
measures to reduce the risk of cancer, metabolic and cardiovascular 
disease can be effective and should be monitored in primary care. The 
success of any intervention to combat multimorbidity will be limited 
by the wish of individuals to reduce its effects and hence their com- 
pliance with preventative measures. However, for the willing, lifestyle 
adjustments and preventative drug treatments are already at hand, with 
a variety of promising new interventions on the near horizon. 


Online content 

Any methods, additional references, Nature Research reporting summaries, source 
data, statements of data availability and associated accession codes are available at 
https://doi.org/10.1038/s41586-018-0457-8. 


or ameliorate symptoms of ageing can affect different groups of hallmarks, 
and different groups of hallmarks can contribute to the aetiology of 
specific age-related phenotypes and diseases. 
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Integrating time from experience in the 
lateral entorhinal cortex 


Albert Tsao**, Jorgen Sugar!, Li Lu’+, Cheng Wang’, James J. Knierim*, May-Britt Moser! & Edvard I. Moser!* 


The encoding of time and its binding to events are crucial for episodic memory, but how these processes are carried out 
in hippocampal-entorhinal circuits is unclear. Here we show in freely foraging rats that temporal information is robustly 
encoded across time scales from seconds to hours within the overall population state of the lateral entorhinal cortex. 
Similarly pronounced encoding of time was not present in the medial entorhinal cortex or in hippocampal areas CA3-CA1. 
When animals’ experiences were constrained by behavioural tasks to become similar across repeated trials, the encoding 
of temporal flow across trials was reduced, whereas the encoding of time relative to the start of trials was improved. The 
findings suggest that populations of lateral entorhinal cortex neurons represent time inherently through the encoding of 
experience. This representation of episodic time may be integrated with spatial inputs from the medial entorhinal cortex 
in the hippocampus, allowing the hippocampus to store a unified representation of what, where and when. 


The representation of time is a crucial component of episodic mem- 
ory’ ’. Although a considerable body of work has now demonstrated 
that the hippocampus has an essential role in generating a representa- 
tion of time*!!, our understanding of how the brain represents time 
for episodic memory (episodic time) is still in a nascent stage. The 
primary function of episodic time is to record the order of events within 
experience, which does not require a precise representation of metric 
time, differentiating it from interval and circadian timing!*-'*. Rather 
than being able to keep precise metric time, the neural code for episodic 
time should have the following two fundamental properties: 1) the code 
should arise automatically without any behavioural training, to support 
one-shot formation of episodic memory, and 2) the code should be able 
to capture the different scales of time at which experience may occur. 
Recently, two types of representation of time have been observed in the 
hippocampus and medial entorhinal cortex (MEC): time cells, which 
fire at specific points in time as an animal performs a task'*-!°, and the 
decorrelation of place cell activity across hours to days*” >. However, 
neither of these representations of time has been shown to fully support 
one-shot formation of episodic memories in combination with variable 
timescales. Furthermore, how either of these representations of time 
arises is unknown. Here, we investigated temporal coding outside the 
place-cell system, in the lateral entorhinal cortex (LEC). We focused on 
the LEC because (i) this area is a major source of cortical input to the 
hippocampus, (ii) previous work has shown that responses in the LEC 
to physical stimuli could be unstable across time”°’, and (iii) a clear 
underlying function has not yet been defined for the LEC. We found a 
representation of time in the LEC that exhibited both of the expected 
signatures of episodic time, and thus could support episodic memory. 


Temporal coding in individual LEC cells 

To explore temporal coding in the LEC, we recorded neural activity 
over more than an hour while rats ran in a box in which the colour of 
the walls alternated between black and white in a fixed pattern over 12 
trials (BW12 experiment, Fig. 1a, Extended Data Fig. 1). An extended 
number of trials was used to increase the likelihood that animals 
defined multiple temporal contexts across the experiment, and an 
interleaved design was chosen to avoid confounding changes in wall 


colour with progression of time. Data were also recorded from the CA3 
and MEC for comparison. Examining LEC responses by eye, we noticed 
that some cells exhibited clear ramping activity (Fig. la, b), raising the 
possibility that the passage of time can be tracked through the firing 
rates of individual LEC cells. Responses to specific environmental 
features such as walls and cue cards”° were also observed, consist- 
ent with the established role of the LEC in encoding environmental 
context?*-99, 

We quantified the influence of wall colour and time on the activity 
of single cells using a generalized linear model (GLM) incorporating 
time, wall colour and position as variables for fitting the firing rates 
of individual neurons, which were binned temporally into blocks of 
500 ms (Extended Data Fig. 2a—d). A considerable number of LEC 
cells were selective specifically for time (20.4% of all recorded cells), 
whereas only 2.0% of CA3 cells and 4.5% of MEC cells were selective for 
time alone (number of cells significantly influenced by at least one 
variable for LEC: 186 out of 451, 41.2%; CA3: 72 out of 148, 48.6%; 
MEC: 49 out of 133, 36.8%; Fig. 1c). The distributions of cell selectivity 
for the LEC, CA3 and MEC were consistent across individual animals 
(Fig. 1c). 

Because time was modelled as a linearly increasing function in the 
GLM, all identified time-selective cells exhibited some form of ramping 
activity. Both increasing and decreasing ramps across a range of time 
constants were observed (Extended Data Fig. 2c), as would be expected 
if they performed a Laplace transform of the recent past*’. Ramping 
cells were found in both deep and superficial layers, with no clear differ- 
ence in the time constants (see Supplementary Information). Ramping 
responses, particularly across the entire recording session, were not due 
to recording instability (Extended Data Fig. 2e-g, see Methods). We 
found little evidence in the LEC for non-ramping time-specific activity 
similar to that of time cells (Extended Data Fig. 3). 


LEC population states encode temporal information 

We next focused on the overall population of LEC cells and asked 
whether its dynamics reflected time coding. We first visualized the over- 
all population data from individual animals using linear discriminant 
analysis to determine the dimensions that optimally discriminate the 
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Fig. 1 | Temporal information within LEC single-cell activity. 

a-c, Experimental design: animals ran 12 250-s trials in a box with either 
black or white walls. Trials were separated by 140-s intertrial periods in 
which the animal was placed in a holding pot. The total session length 
was 1.3 h. a, Four example LEC cells. Left, path plots showing the animal’s 
location in grey, and cell spikes in red. Right, firing rate plots for the same 
cells. Dark and light grey indicate trial periods with black (B) and white 
(W) walls, respectively. Unshaded regions indicate the intertrial periods. 
Cells 1-3 responded to features of the recording box with no obvious 
temporal component to their activity, whereas cell 4 exhibited ramping 
activity across each trial. b, Left, example GLM fit results for four cells 
with selectivity for different features, with the observed firing rate shown 
in grey, and predicted firing rate in blue. Right, average tetrode waveforms 
for the first (green) and last (black) quarter of the session. c, Distribution 
of selectivity for wall colour, position, time and mixtures of variables for 
the LEC (top, 1 =3 animals with 90, 141 and 220 cells), CA3 (middle, n =3 
animals with 78, 42 and 28 cells) and MEC (bottom, n= 2 animals with 

31 and 102 cells). Circles indicate individual animals, solid lines indicate 
mean + s.e.m., shading indicates the same individual across the different 
variables. 


24 experiment-defined states (12 trials, 12 intertrials), and plotting the 
2D projection that yielded the best discrimination (Fig. 2, Extended 
Data Fig. 4a—c). Population activity for the entire session separated 
into distinct modes corresponding to the environmental contexts of 
the experiment, as expected from previous work demonstrating LEC 
involvement in encoding context?*-*. In addition, there was a promi- 
nent progression of states corresponding to the temporal order of the 
experiment. Thus, across the entire session, the three environmental 
contexts could be separated along one axis in state space, while the 
temporal epoch of each trial could be separated along another. 

To quantify the temporal information present, we trained a linear 
multiclass support vector machine to identify temporal epochs (trial 1, 
intertrial 1, trial 2, and so on) based on neural activity from individ- 
ual animals, pooled across recording sessions. Population activity 
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Fig. 2 | Visualizing LEC population activity. 2D projections of 

neural population responses. Axes correspond to the first two linear 
discriminants (LD1 and LD2; arbitrary units). Left column shows LEC 
population responses, middle column shows CA3 population responses, 
right column shows MEC population responses, each from an example 
animal. Each trial’s wall colour is indicated by a shade of green (black 
walls) or purple (white walls); intertrial periods are shown in grey. 
Progression of shade from dark to light for trial and intertrial periods 
indicates the progression of time. Note progression by time for the LEC 
(see also Extended Data Fig. 4a). Linear discriminant analysis projections 
were used for visualization purposes only. 


defined by the firing rates of cells was binned into 10 s bins, with each 
bin labelled by the temporal epoch it was in. The decoder was then 
trained to identify temporal epochs based on population activity, with 
tenfold cross-validation (Extended Data Fig. 4d). Statistical significance 
was evaluated using a permutation method (see Methods). Very high 
decoding accuracy for temporal epoch identity across the whole session 
spanning both trial and intertrial periods was observed for all LEC 
animals during the BW12 experiment, indicating substantial tempo- 
ral information was present in LEC population activity (88.0% mean 
accuracy, chance level 4.2%; Fig. 3a). The high decoding accuracy was 
not due to pooling cells across recording days, changes in behaviour 
or noise-driven variability of population activity states (Extended Data 
Figs. 4e, f, 5). To verify our observations further, we trained decoders 
using data from additional LEC animals that ran a simplified four-trial 
version of the BW12 experiment (BW4), and obtained similar results 
(93.1% mean accuracy, chance level 25%; Extended Data Fig. 4g, h). 
Thus, population activity in the LEC clearly defined a unique temporal 
context for every epoch of experience on the timescale of minutes. 

Visualization of data from the CA3 and MEC revealed population 
activity patterns that were different from those observed in the LEC 
(Fig. 2). Quantification of temporal information across the entire 
session, including both trial and intertrial periods, using decoders 
trained on CA3 and MEC data revealed decoding accuracies above 
chance, but lower than that found in the LEC (23.0% and 34.4% mean 
accuracy for CA3 and MEC, respectively). To compare decoding 
accuracy properly across the three areas, we trained separate decoders 
for each area using populations with equal size to ensure that higher 
accuracy was not simply due to the data having higher dimensionality. 
The LEC still contained considerably more temporal information than 
either the CA3 or MEC, as decoding accuracy for temporal epoch was 
higher for the LEC by a sizeable margin (45.5%, 20.4% and 26.2% mean 
accuracies for LEC, CA3 and MEC, respectively; Fig. 3b, c). The differ- 
ence in decoding accuracy was maintained across a range of population 
sizes, with the LEC requiring significantly fewer cells to reach high 
decoding accuracy (Fig. 3d, Extended Data Fig. 4i). In the simplified 
BW4 experiments, we also had data from the CA2 and CA1, allow- 
ing us to compare decoding accuracies for temporal epoch across the 
entorhinal cortex and all the CA subfields. We again found, in separate 
animals, that decoding accuracy was highest for the LEC (Extended 
Data Fig. 4h). In total, these observations point to the LEC as a possible 
source of temporal context information necessary for episodic memory 
formation in the hippocampus. 

The robust encoding of time in the LEC at the population level could 
be due primarily to cells with ramp-like activity. To test whether cells 
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Fig. 3 | Temporal information within LEC population activity. 

a, Decoding accuracies for temporal epoch across the whole recording 
session for the LEC (n = 3 animals), CA3 (n =3 animals), and MEC 

(n=2 animals). b, As in a, but for size-matched populations. *P < 0.05 
(LEC versus MEC, t(3) = 4.81), **P< 0.01 (LEC versus CA3, t(4) =5.77), 
unpaired t-test with Bonferroni correction; matched population size = 28 
cells. c, Confusion matrices from example animals, using size-matched 
populations. Each matrix shows the entire session, with each entry 
corresponding to a single epoch (trial 1, intertrial 1, trial 2 and so on). 
Epoch type is indicated along the left and bottom (grey for intertrial 
periods, black for black wall trial periods, white for white wall trial 
periods). d, Relationship between population size and decoding accuracy. 
Lines indicates exponential curve fit to data (shown as points, pooled 
across n = 3, 3 and 2 animals for LEC, CA3 and MEC, respectively). 

e, Decoding accuracy for temporal epoch using subpopulations with all 


not classified as selective for time nonetheless also encoded temporal 
information, possibly in a nonlinear form*, we trained decoders for 
temporal epoch using only cells that were not selective for time, and 
found that decoding accuracy remained very high. Comparing against 
size-matched populations with cells drawn randomly from the entire 
dataset, decoding accuracy for time using only non-time-selective cells 
was not significantly different (77.2% versus 79.2% mean accuracy for 
trial identity using a population with no time-selective cells, compared 
to a size-matched randomly drawn population; Fig. 3e). 

If the temporal information observed within LEC population 
activity is actually used as a temporal code, an intriguing question is 
whether it is limited to representing just the macroscopic temporal 
context of experiences, on the timescale of minutes and longer, or 
whether it is also capable of representing the order of events within 
experiences, which may be on timescales shorter than minutes. We 
examined whether the LEC could encode temporal epochs of shorter 
length than entire trial or intertrial periods by dividing trial-period data 
into shorter epochs and training decoders to identify these shortened 
epochs (Fig. 3f). Decoding accuracies for 20-s, 10-s and 1-s epoch- 
lengths across the entire session were all significantly above chance 
(27.8%, 21.4% and 1.6% mean accuracy for 20-s, 10-s and 1-s epochs, 
respectively; chance levels 0.4%, 0.2% and 0.02%, respectively; Fig. 3f, g, 
Extended Data Fig. 6a—c and Supplementary Information). Although 
decoding accuracy decreased with shorter epoch lengths, decoding 
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Predicted epoch 


time-selective cells removed, compared to randomly drawn size-matched 
controls. NS, not significant (P = 0.42; t(2) = 1.00), paired t-test; n =3 
animals with 62, 107 and 150 cells not selective for time. f, Decoding 
accuracies for different temporal epoch lengths for the LEC. Left, 

method for constructing shortened epochs. Right, decoding accuracies 

for different epoch lengths (n = 3 animals, matched population size = 90 
cells). g, Confusion matrix for 20-s epochs from example animal. The 
matrix contains 228 epochs (each trial period truncated to 240 s and 
divided into 12 20-s epochs, each intertrial period of 140 s divided into 

7 20-s epochs, 19 epochs per trial/intertrial pair, 12 total pairs across the 
session, giving 228 epochs). Left, confusion matrix for the entire session. 
Right, sum across trial/intertrial sections of the whole session matrix 
(outlined by dashed line in whole session matrix). a, b, d, f, Circles indicate 
individual animals, solid lines indicate mean decoding accuracies + s.e.m., 
dashed lines indicate chance levels. 


errors predominantly predicted adjacent epochs, as opposed to 
temporally distant epochs (Fig. 3g). Decreased accuracy may have been 
due to the limited number of cells that we recorded from. By pooling 
together all recorded cells from every animal to generate a population 
of 451 cells, the decoding accuracy for 20-s epochs was 86.0%, and the 
decoding accuracies for 10-s and 1-s epochs were 83.0% and 30.5%, 
respectively. Decoding accuracy for shortened epochs was not due to 
differences in population activity on the timescale of trial or intertrial 
periods, as decoding accuracy for shortened epochs within single trial 
and intertrial periods was still above chance (Extended Data Fig. 6d, e). 
Thus, temporal information in the LEC was flexible enough to support 
the encoding of events happening across a wide range of timescales. 

In a separate set of experiments, we tested whether changing the 
content of the animal’s experience affected temporal information in 
the LEC by introducing an object into the recording environment. 
We found a similar representation of time at both the single-cell and 
population level (Extended Data Fig. 7). Information about the current 
environmental context (B or W, Extended Data Fig. 4j—1) as well as 
information about the immediately preceding context was also present 
in the LEC (Supplementary Information). Overall, these results suggest 
a large component of LEC population activity codes for time across 
multiple scales, expressed both through cells explicitly coding time as 
well as cells having mixed selectivity for time that requires population 
decoding to extract. 
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Fig. 4 | Temporal information arises inherently. a, Continuous 
alternation task, in which the animals alternated between left and right 
turns when they reached the top of the central stem for 40 total trials. 
Black circle indicates the base of the central stem. b, Left, 2D projection 
of LEC neural population responses during the figure-eight experiment 
from example animal. Right, 2D projection of LEC neural population 
responses during matched periods from BW experiments. c, Left, 
decoding accuracy for trial identity during the figure-eight experiment 
compared to decoding accuracy for temporal epoch using matched data 
from BW experiments. ***P < 0.0001 ((8) = 8.35), unpaired t-test; 
matched population size = 31 cells, n =3 and 7 animals for figure-eight 
and BW, respectively. Circles indicate individual animals, solid lines 
indicate mean decoding accuracies + s.e.m., dashed lines indicate chance 
levels. Right, confusion matrix for the figure-eight experiment from 
example animal. 


Temporal information arises inherently 

There are several possible mechanisms by which time may be incor- 
porated into the population representation in the LEC. One possibility 
(‘explicit mechanism) is that the LEC actively, in a clock-like man- 
ner, generates timestamps for representations of experience. Another 
possibility (‘inherent mechanism) is that temporal information in the 
LEC arises simply because the animal’s moment-to-moment experience 
constantly changes, and time can be extracted from this changing flow 
of experience by integrating the amount of change!” (Extended Data 
Figs. 8, 9a). We sought to distinguish these two possibilities by con- 
straining experience through the use of a more structured task in which 
the animal's behaviour was stereotyped. If the temporal information 
present within the LEC arose from an explicit clock-like process, we 
would expect to see no change in the amount of temporal information 
present when compared to results from the free-foraging BW12/BW4 
experiments. By contrast, if the temporal information within the LEC 
arose inherently through its encoding of experience, we would expect 
to see a decrease in the amount of temporal information due to the 
repetitive nature of the task. 

We examined the variability of temporal representation in the LEC 
in a separate experiment in which animals performed a learned contin- 
uous-alternation task (figure-eight task, Fig. 4a). Trials were aligned by 
the time point at which the animal entered the central stem, and trials 
consisted of activity spanning from three seconds before to three sec- 
onds after entering the central stem. Visualizing the data, we observed 
that the separation between trials was reduced in comparison to BW 
data (Fig. 4b). Decoding accuracy for trial identity across the entire 
session was above chance, but much lower than for matched BW data 
(10.6% versus 21.5% mean accuracy for figure-eight and BW respec- 
tively, chance level 5%; Fig. 4c, Extended Data Fig. 9b). In a separate 
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Fig. 5 | Temporal coding depends on behavioural context. a, Proportion 
of cells exhibiting selectivity for session time, trial time or trial type for 
the figure-eight (light blue), BW12 (purple), and object experiments 
(magenta). Session/mixed session time: *P < 0.05 (figure-eight versus 
BW12, t(4) = 4.35), **P <0.01 (figure-eight versus object, t(4) = 10.10). 
Trial/mixed trial time: **P < 0.01 (figure-eight versus BW12, t(4) = 6.70; 
figure-eight versus object, t(4) = 7.28). Trial type: **P < 0.01 (figure-eight 
versus BW 12, t(4) =5.56), *P < 0.05 (figure-eight versus object, 

t(4) = 4.45), unpaired t-test on arcsine-transformed data with Bonferroni 
correction; n = 3 animals for all. b, 2D projection of neural trajectories 
for LEC data during the figure-eight experiment from example animal. 
Axes correspond to the first two principal components (PC1 and 

PC2; arbitrary units). c, Decoding accuracy for trial time compared to 
matched BW data. ***P < 107!” (¢(8) = 38.98), unpaired t-test; matched 
population size = 31 cells, n =3 and 7 animals for figure-eight and BW, 
respectively. d, Decoding accuracy for trial time during the figure-eight 
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experiment using subpopulations with all spatially selective cells removed, 
compared to randomly drawn size-matched controls. NS, not significant 
(P=0.16, t(2) =2.17), paired t-test; n =3 animals with 38, 44 and 22 

cells not selective for space. e, Decoding accuracy for trial type during 

the figure-eight experiment compared to decoding accuracy for wall 
colour using matched BW data. *P < 0.05 (t(8) = 2.74), unpaired t-test; 
matched population size = 31 cells, n =3 and 7 animals for figure-eight 
and BW, respectively. f, Decoding accuracy for trial type (left) or trial time 
(right) during the figure-eight experiment, with cells selective for decoded 
variable removed, compared to randomly drawn size-matched controls. 
*P <0.05 (trial type: t(2) = 4.83; trial time: t(2) = 4.40), paired t-test; n= 3 
animals with 27, 29 and 8 cells not selective for trial time, and 29, 31 and 
15 cells not selective for trial type). Circles indicate individual animals, 
solid lines indicate mean + s.e.m. of described measurement, dashed lines 
indicate chance levels. 
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experiment in which animals ran repeated laps on a circular track, 
reduced decoding accuracy for trial identity during repetitive experi- 
ence was also observed (Extended Data Fig. 9c-f). Overall, these results 
are consistent with temporal information in the LEC arising not in an 
explicit clock-like manner, but inherently from the dynamics underly- 
ing the representation of ongoing experience in the LEC. 


Temporal coding varies with changing behaviour 

The hippocampus represents learned time intervals'*'®!?*4, and this 
representation of time may depend on the retrieval of stable temporal 
contexts**. The reduction in across-session temporal information 
during learned behaviour may reflect a similar instatement of a stable 
temporal context in the LEC for the purpose of representing the 
relevant features of the task, which may include the progression of 
time within single trials. Analysis of single-cell responses using a 
GLM incorporating trial type (left/right turn), trial time (time within 
single trials), and session time showed that the ratio of cells encod- 
ing trial time versus session time was significantly altered, with more 
cells encoding trial time and fewer cells encoding session time in the 
figure-eight experiment (5.7%, 18.2% and 34.0% cells selective for 
session time or mixed session time (session time + trial type) for 
the figure-eight, BW 12, and object experiments, respectively; 35.7%, 
6.2% and 7.7% cells selective for trial or mixed trial time (trial 
time + trial type) for the figure-eight, BW12 and object experiments, 
respectively; Fig. 5a, Extended Data Fig. 10b-e). The proportion of 
cells selective for trial type also differed significantly (20.8%, 4.9% and 
8.0% cells selective for context for the figure-eight, BW12 and object 
experiments, respectively; Fig. 5a). 

Visualization of overall population activity using principal com- 
ponent analysis to plot 2D projections of neural trajectories through 
population activity space showed that neural trajectories were relatively 
constant across trials, suggesting that the LEC was in fact in a different, 
more stable mode of activity compared to that observed during free 
foraging (Fig. 5b, Extended Data Fig. 10f). Consistent with this obser- 
vation, decoding accuracy for time relative to the start of each trial 
was significantly above chance and much higher than for matched BW 
data (45.3% versus 18.3% mean accuracy, chance level 16.6%; Fig. 5c). 
Although trial time was tightly correlated with position, the significant 
decoding of trial time did not seem to be due purely to LEC activity 
reflecting spatial location (10.0% cells selective for trial time exclu- 
sively with no influence of position, Fig. 5d). In addition to changes 
in the type of temporal information present in LEC population activity, 
the amount of task-related information also appeared to change, as 
the decoding accuracy for trial type was higher for the figure-eight 
data than the decoding accuracy for wall colour using matched BW 
data (81.3% versus 72.1% mean accuracy, chance level 50%; Fig. 5e). 
Finally, the degree to which information was distributed across the 
entire population was decreased compared to BW experiments, as 
decoding accuracy for both trial type and trial time were significantly 
reduced compared to size-matched randomly drawn controls when 
cells selective for trial type or trial time respectively were removed 
(56.1% versus 76.4% mean accuracy for trial type, 32.2% versus 41.9% 
mean accuracy for trial time; Fig. 5f). Overall, our results suggest that 
as animals engaged in a structured, learned task, the dynamics of LEC 
activity became considerably more stable compared to when animals 
were engaged in free behaviour. 


Discussion 

Being able to recall the temporal details of past experiences is a 
fundamental element of episodic memory. Our recordings demon- 
strate a unique temporal signal in the LEC that can encode time across 
multiple scales from seconds to hours and across different environ- 
mental contexts. Ordinarily this code for time marks the free-flowing 
progression of time, reflecting the structure of ongoing experience. 
However, when animals engage in a structured behavioural task in 
which experience is similar across repeated trials, time coding becomes 
relative, encoding time with respect to temporal landmarks. The 
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adaptable nature of this code for time makes it particularly well-suited 
for defining the temporal component of episodic memory (episodic 
time) and differentiates it from previously described interval-timing 
mechanisms***8. 

Our results support recent theoretical work demonstrating that 
encoding of time can arise inherently as a result of interactions between 
externally and internally driven states*!*3°. This form of time coding 
represents different points in time using different high-dimensional 
population states that can be easily differentiated by downstream 
readout neurons (Extended Data Fig. 8). In the LEC, such high- 
dimensional states may be generated by a combination of the recurrent 
local connectivity of the LEC” and the uniquely diverse set of inputs 
that the LEC receives*'. Given the anatomical position of the LEC asa 
major gateway for information entering the hippocampus, it is possi- 
ble that representations of time outside of the entorhinal/hippocampal 
circuit” are integrated in the LEC to form a single representation of 
time for episodic memories. 

Within the entorhinal/hippocampal circuit, two representations of 
time have been identified in previous work: time cells, which fire at 
specific points in time as an animal performs a task'>-", and the decor- 
relation of place cell activity across hours to days*”*°. Continuously 
changing LEC activity may underlie both scales of temporal representa- 
tion: drift of place cell activity may be governed by a constantly chang- 
ing LEC input that is time-varying on the scale of minutes to hours, 
and sequential activity of time cells may be driven in part by LEC input 
that is time-varying relative to task events on the scale of seconds*!“4. 
Thus, although an episodic memory may contain both a fine-grained 
representation of the sequence of events composing that memory as 
well as a coarser temporal context for the overall episode, both of these 
scales of temporal representation may originate from a single temporal 
signal within the LEC. This signal may then reach the hippocampus to 
become part of a unified what-when-where representation of experi- 
ence, space and time’, in which the representation of experience and 
time arising in the LEC is integrated with the representation of space 
arising in the MEC*®. 
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METHODS 


Subjects. Experiments were carried out using twenty-one male Long Evans rats at 
NTNU and two male Long-Evans rats at Johns Hopkins University. Animals were 
housed individually in Plexiglas cages. Ten rats had tetrodes in the LEC, two rats 
had tetrodes in the LEC and CA3, two rats had tetrodes in the MEC, and nine rats 
had tetrodes in the CA3, CA2 or CA1. Data from seven of the LEC animals and 
the nine animals with tetrodes solely in the hippocampus have been published 
previously, but examined activity of individual cells rather than the overall popu- 
lation**”, Animals were maintained on a 12-h light/12-h dark schedule, and kept 
at 85-90% of free-feeding body weight. Experiments were performed in accord- 
ance with the Norwegian Animal Welfare Act and the European Convention for 
the Protection of Vertebrate Animals used for Experimental and Other Scientific 
Purposes, or the Institutional Animal Care and Use Committee at Johns Hopkins 
University. The study contained no randomization to experimental treatments and 
no blinding. Sample size (the number of animals) was set based on conventions 
in the field. 

Surgery and electrode preparation. Rats were anaesthetized with isoflurane 
(air flow: 1.01 min“, 0.5-3% isoflurane, adjusted according to physiological mon- 
itoring). For LEC recordings, either one microdrive with four tetrodes (10 rats) 
or one hyperdrive with 18 tetrodes was targeted to the LEC (2 rats). Coordinates 
for microdrive LEC animals were: anterior-posterior (AP): 0-0.2 mm anterior 
to lambda, medial-lateral (ML): 5.3-5.5 mm lateral to midline, dorsal-ventral 
(DV): 4.3-5.0 mm below dura, 4-8° angle in the coronal plane, with electrode tips 
pointing away from midline. Coordinates for hyperdrive LEC animals were: AP: 
7.5-7.6 mm posterior to bregma, ML: 3.0 mm lateral to midline, DV: at brain sur- 
face, 25° angle in the coronal plane, with electrode tips pointing away from midline. 
For MEC recordings, one microdrive was targeted to the MEC with coordinates: 
AP: 0.2-0.4 mm anterior to the transverse sinus, ML: 4.6 mm lateral to midline, 
DV: 1.8 mm below dura, 20° angle in the sagittal plane, with electrode tips pointing 
towards bregma. For hippocampal recordings, either one microdrive was targeted 
to the CA3, with coordinates: AP: 3.8 mm posterior to bregma, 3.0 mm lateral 
to midline, DV: 1.5 mm below dura (3 rats), or one hyperdrive was placed over 
hippocampus with coordinates: AP: 2.5-4.3 mm posterior to bregma, ML: 2.2-3.8 mm 
lateral to midline (9 rats). Drives were fixed to the skull using jeweller’s screws 
and dental cement. One screw served as a ground for each drive. Tetrodes were 
made from four twisted 17 |1m polyimide-coated platinum-iridium (90-10%) wires 
(California Fine Wire). Electrode tips were plated with platinum to reduce elec- 
trode impedances to between 150-300 kQ at 1 kHz. 

Recording procedures. Microdrives were connected to a multi-channel unity gain 
headstage, which was connected via a cable to a Neuralynx recording system for 
BW12 and circular-track experiments, or an Axona recording system for the BW4, 
object and figure-eight experiments (Axona Ltd). Unit activity was amplified by a 
factor of 1,000-10,000 and band-pass filtered from 600 to 6,000 Hz for Neuralynx 
recordings or from 800 to 6,700 Hz for Axona recordings. Spike waveforms above a 
set threshold were time-stamped and digitized at 32 kHz for 1 ms for all recordings. 
Tetrodes were lowered in 50-|1m steps while the rat rested on a towel in a flower pot 
on a pedestal. Turning stopped when well-separated units appeared. Data collec- 
tion started when signal amplitudes exceeded approximately five times the noise 
level (root mean square 20-501V) and units were stable for >3 h. Animal position 
was tracked at 25-30 Hz for Neuralynx recordings or 50 Hz for Axona recordings 
using an overhead video camera and either two (for microdrive animals) or mul- 
tiple (for hyperdrive animals) LEDs attached to the headstage. 

Behavioural procedures. For BW12, BW4 and object experiments, animals were 
trained to collect randomly scattered chocolate cereal crumbs in a square box. For 
BW 12 experiments, animals ran in an 80 x 80 x 50 cm box with interchangeable 
walls, a single cue card along one wall, and curtains on two sides of the box. For 
BW4 experiments, the box was 100 x 100 cm. For object experiments, the box was 
100 x 100 cm, the walls were black with a single cue card, and no curtains were 
present, such that there were many distal cues. Once animals were able to achieve 
good coverage of the recording environment, they began running standard sessions 
of 12 trials for BW12 experiments or 4 trials for BW4 experiments, during which 
the wall colour was changed across trials. For object experiments, animals ran 
3 trials, during which an object consisting of a 6 x 6 x 37 cm tower made of Lego 
was placed in a fixed location during the second trial*”. 

For circular track experiments, animals were trained to run back and forth 
between food wells for food pellets (BioServ) on a circular track (diameter 97 cm, 
width 10 cm). The food wells were separated by a 15 cm tall black barrier with 
0.4 cm sidewalls. For the figure-eight experiments, animals were trained to run in 
a figure-eight pattern on a square figure-eight maze made of plexiglass with vinyl 
flooring which had runways 15 cm wide with 2 cm high side walls, a central stem 
that was 150 cm long, and total dimensions of 150 x 150 cm. Rewards were placed 
in dishes at the end of each goal arm. The maze was elevated 50 cm above the 
ground, surrounded by black curtains with a cue card 100 x 25 cm on the left side 
of the maze. Animals were trained in three stages“. In the first stage, wood-block 
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barriers were placed at the bottom and top of the central stem so that rats could 
only run a fixed path. The position of the barriers was alternated on each trial once 
the rat reached the reward zone, and animals remained at this stage until they ran 
>20 trials within a 20-min session. In the second stage, the barrier at the top of 
the central stem was phased out so that animals could enter either reward arm. 
Only correct alternating choices were rewarded, and animals were blocked from 
backtracking to explore the stem or other reward arm once a choice was made. In 
the third and final stage, the barrier at the bottom of the central stem was phased 
out, and animals had to continue to run in a figure-eight pattern, with backtracking 
blocked by a barrier. After reaching the third stage, animals ran 20-min sessions, 
averaging >50 trials per session with >95% correct choices. 

Spike sorting and cell classification. Spike sorting was performed offline using 
graphical cluster-cutting software to examine two-dimensional projections of the 
multidimensional parameter space (Tint (N. Burgess) for Axona data; MClust 
(A. D. Redish) or custom-written spike-sorting software for Neuralynx data). Spike 
clusters were compared across successive days to ensure that the same cell was not 
counted twice. 

Cell stability across BW12 experimental sessions was determined by checking 
for stationarity of spike waveforms. For each cell, the Euclidean distance between 
spike waveforms across the recording session and the average waveform of the 
first ten spikes was measured. For each spike, the four spike waveforms were com- 
pressed into a single point in a 128-dimensional space (4 waveforms, each with 
32 values), and Euclidean distance was measured in this 128-dimensional space. 
Subsequently, the stationarity of this distance was determined using three tests: 
augmented Dickey—Fuller test, the Kwiatkowski-Phillips—Schmidt-Shin test, and 
the Ljung-Box test, with stable cells being those which were categorized as station- 
ary across all three tests. Following this stability test, stability was also assessed by 
calculating the Pearson correlation between the measured Euclidean distances 
across the recording session and firing rate across the recording session. Cells were 
excluded from further analysis if their correlation values exceeded chance levels. 
Chance levels were estimated individually for each cell by temporally shuffling the 
firing rate for a given cell and calculating the correlation between the waveform dis- 
tance of the cell and the shuffled firing rates, repeating this process 1,000 times, and 
taking the 95th percentile as the threshold. Firing rate was estimated for 500-ms 
bins, with no smoothing or additional pre-processing. 

GLM fitting. A Poisson GLM was fit to each cell individually using the MATLAB 
stepwiseglm function. The variables used to fit the model for BW 12 data were wall 
colour, position, trial time, and session time. A single predictor was used for each 
variable. Session time was the total elapsed time, and took into account intertrial 
periods. Adjusted R? was used as the criterion for adding or removing terms (0.01 
for adding, 0.005 for removing). Firing rate was estimated for 500-ms bins, with no 
smoothing or additional preprocessing. The same process was used for the object 
and figure-eight data, excluding position as a variable. 

Estimating time constants. For cells that were classified as selective for trial or 
session time, a single-term exponential model was fit to the firing rate of each cell, 
f= ae™, in which fis the firing rate of the cell, a and b are constant coefficients, 
and b is the time constant of the cell. For cells selective for session time, the firing 
rate across all trial periods was used. For cells selective for trial time, the average 
firing rate across trials was used. 

Dimension-reduced visualization of data. 2D projections of population activity 
states were constructed by reducing the dimensionality of raw neural data using 
principal component analysis (PCA), followed by applying linear discriminant 
analysis to the dimension-reduced data and taking the top two linear discrimi- 
nants. Dimension reduction for visualization was carried out separately for ‘whole 
session and ‘trial periods only’ plots. For neural trajectories, PCA was applied to 
spiketrains that were first smoothed using a Gaussian kernel 500 ms wide. The top 
two principal components were taken for 2D projections. In both cases, neural data 
were from single animals but pooled across recording days. Bin size was 500 ms for 
both projections of population activity states and neural trajectories. 

Decoding analysis. Decoding was done with linear support vector machine 
classifiers implemented using the LIBLINEAR package”, specifically using 
L2-regularized L2-loss SVC, tenfold cross-validation, and cost parameter C= 1. 
Data consisted of raw spike counts from individual animals, pooled across record- 
ing days, and binned in 10-s bins, except for results varying epoch length, circular 
track, and figure-eight experiments. For shortened epochs, bin size was 1 s for 
20-s-long epochs, 500 ms for 10-s-long epochs, and 50 ms for 1-s-long epochs. 
For the circular track and figure-eight results, the bin size was 500 ms. Multiclass 
decoding was done using the one-versus-all method. The cost parameter was 
varied between 10-3 and 10° with no significant effect. Tenfold cross-validation 
was implemented by splitting the data into 10 subsamples, each randomly drawn 
from across the entire session. Decoders were then trained on 9 of the subsam- 
ples, and tested on the remaining subsample, with this process repeated using all 
10 subsamples were used as test data once (Extended Data Fig. 4d). Bin-size 
was sufficiently shorter than epoch length to ensure that training data covered all 
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temporal epochs. Decoding accuracy was taken as the average accuracy across all 
10 trained decoders. Cross-validation was repeated 1,000 times with overall decod- 
ing accuracy taken as the mean across the 1,000 repetitions, except for when popu- 
lation subsampling occurred, in which case 1,000 different subsamples were taken, 
each with 10 repetitions of cross-validation. For comparing decoding accuracies 
across areas, size-matched populations were used. Size-matched populations were 
generated by subsampling the total population of cells for a given animal without 
replacement. For each animal, subsampling was repeated 1000 times, and the mean 
was taken as the decoding accuracy for that animal. For decoding analyses in which 
GLM.-classified cells were removed, the size-matched populations for comparison 
were randomly drawn from the full population of cells recorded for each individual 
animal without replacement. 

For the figure-eight and circular track experiments, matched BW data consisted 
of 6-s-long epochs with 500 ms bins and an intertrial interval of 22 s. Because 
inclusion of a matched intertrial interval caused matched data to exceed the length 
ofa single BW trial, matched data spanned across 3 trials of BW data. Wall colour 
in matched BW data corresponded to trial type for the figure-eight data (black 
walls/left trial, white walls/right trial), and running direction for circular track 
data (black walls/clockwise, white walls/anticlockwise). In total, 20 epochs were 
used from the BW data to match the 20 trials from the figure-eight experiment, 
with population size set to 31 cells. For circular track data, 15 epochs were used 
with population size set to 47 cells. Figure-eight and circular track data were also 
compared against matched BW data that did not account for the intertrial period. 
In this case, matched data were generated by taking activity from the first trial only 
of each session, with matched bin size and population size (20 epochs, each 6-s 
long, made up of 12 500-ms bins, in total spanning the first 120 s of the first trial, 
with population size set to 31 cells for the figure-eight data; 15 epochs, each 6-s 
long, made up of 12 500-ms bins, in total spanning the first 90 s of the first trial, 
with population size set to 47 cells for circular track data). 

Determining statistical significance for decoding accuracies. Statistical sig- 
nificance for decoding accuracy was determined by comparing mean decoding 
accuracy from the original data against mean decoding accuracy from temporally 
shuffled data. Shuffled comparisons were generated by first temporally shuffling 
population activity such that the shuffled order was the same for each cell within 
the population. Decoding accuracy was then determined using the original labels 
for time bins and tenfold cross-validation. This process was repeated 1,000 times to 
generate shuffled comparisons for full-population decoding accuracies, and 1,000 
times for each subsampled population of cells to generate shuffled comparisons 
for size-matched decoding accuracies. All decoding accuracies were significantly 
above chance, with the least significant decoding accuracy having P< 10~°. 

Estimates of distance between high-dimensional population states. For 
within-trial or intertrial period distances, population states were defined by the 
firing rates of cells from individual animals, pooled across recording days, for 10-s 


time bins. Firing rates were not smoothed. Distances were measured for each of the 
12 trials or intertrials, and the average was taken as the final result for each animal. 
For across-trial or intertrial period distances, population states were defined in the 
same way, but then the average population state for each trial or intertrial period 
was calculated. Because the data were high-dimensional, Manhattan distance was 
used instead of Euclidean distance®”. To account for differences in dimensionality 
across animals and facilitate comparison, distance measures were z-scored. 
Classifying spatial cell types and measuring spatial tuning. Spatial rate maps 
were generated by binning activity into 3 x 3 cm spatial bins, calculating firing 
rates for each spatial bin, and then applying a two-dimensional Gaussian kernel 
with standard deviation of 7 cm in both directions. Only spikes recorded during 
running speeds above 2.5 cm s-! were used. Place cells were classified by com- 
paring spatial information scores against a shuffled distribution®!. Grid cells were 
classified by comparing autocorrelogram-based gridness scores against a shuffled 
distribution’. Speed cells were classified by comparing Pearson correlation values 
against a shuffled distribution*”. Velocity, acceleration and head direction were 
calculated using methods published previously*. Spatial selectivity for LEC cells 
was determined by measuring spatial information® for each cell and comparing 
against a distribution generated by calculating spatial information for data shuffled 
temporally using wrap-around shuffling. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Code and data availability. Custom code used and datasets generated and/or 
analysed during the current study are available from the corresponding author 
upon request. 
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Extended Data Fig. 1 | Histology for LEC animals. a—d, Nissl-stained indicate range for tetrode locations. Activity was recorded from neurons in 
coronal sections showing recording locations for LEC animals used in all layers of the lateral half of the LEC. Dashed lines indicate approximate 


the BW12 experiments (a), BW4 and figure-eight experiments (b), circular anatomical borders of the LEC. 
track experiments (c), and object experiments (d). Red arrowheads 
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Extended Data Fig. 2 | Single-cell responses in the LEC related to time. 
a, Schematic of the GLM used for determining cell selectivity. Learned 
weights for the relevant predictors, as determined by the stepwise selection 
process, were put through an exponential nonlinearity that returned 

the mean rate of a Poisson process from which spikes were drawn (for 
LEC, 14.0% selective for session time, 3.8% selective for trial time, 2.7% 
selective for a mixture of trial and session time, percentages averaged 
across all animals). b, Explained variance for all LEC cells fit by the GLM 
for BW12 experiment (n = 186 cells). Average explained variance was 0.05. 
c, Distribution of time constants for trial-time or session-time selective 
cells (n = 80 cells). Time constants were estimated for each cell classified 
as trial-time or session-time selective by fitting a single-term exponential. 
d, Left, five additional example cells exhibiting ramping activity across 

the session. The firing rate of each cell is shown in grey, with the 
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cell. Right, average waveforms across four recording channels for the 

first (green line) and last (black line) quarter of the session. e, Pearson 
correlation between waveform distances (Euclidean distance between pairs 
of consecutive spike waveforms) and firing rate for all LEC cells used in 
the BW12 experiment (n = 451 cells). f, Comparison of the correlation 
between waveform distance and firing rate, as in e, for time-selective 

LEC cells (n = 92 cells, grey) and all other LEC cells used in the BW12 
experiment (n = 359 cells, black). g, Six example cells demonstrating that 
notable fluctuations in firing rate can occur during stable recordings. For 
each cell, the top left row shows activity during trial periods, bottom left 
row shows activity during intertrial periods, and right panel shows average 
waveforms, as in d. 
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Extended Data Fig. 3 | Time-cell-like activity. a~h, Temporal specificity 
examined as a function of peak firing times of individual cells. a, b, Mean 
activity from example animals for the LEC (top matrix), CA3 (middle 
matrix) and MEC (bottom matrix) during trial (a) and intertrial (b) 
periods. For each matrix, rows show mean firing rates for individual 
cells ordered by the time of peak firing rate. Actual data shown in left 
column, shuffled data shown in right column. Cell identities were not 
maintained across actual and shuffled data. c, Fraction of cells with 
significant temporally specific activity for trial (top) and intertrial 
(bottom) periods (n= 3, 3 and 2 animals for LEC, CA3 and MEC, 
respectively). d, Time of peak activity for cells with significant temporally 
specific activity during trial periods. e, As in d, but for intertrial periods. 
f-h, Temporal specificity examined by calculating temporal information. 
f, Fraction of cells with significant temporal information for trial (top) 
and intertrial (bottom) periods (n = 3, 3 and 2 animals for LEC, CA3 and 
MEC, respectively). g, Distribution of significant temporal information 
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Temporal information (bits) 


scores for trial periods. h, As in g, but for intertrial periods. i, Predictors 
added for expanded GLM: symmetrical ramps and single-trial ramps. 
Coloured lines highlight the predictors for the respective trial indicated on 
top, grey lines are the rest of the available predictors. j, Explained variance 
for all LEC cells fit by expanded GLM for the BW12 experiment (n = 350 
cells). Average explained variance was 0.03. k, Distribution of selectivity 
for time, wall colour and position for single LEC cells, determined 

using the expanded GLM (n=3 animals). Shade indicates the same 
individual across the different variables. 1, Left, examples of expanded 
GLM fit results for four cells with selectivity for different features. The 
firing rate of each cell is shown in grey, with the model-predicted firing 
rate in blue. The R? value is shown for each cell. Right, average waveforms 
across four recording channels for the first quarter (green line) and last 
quarter (black line) quarter of the session. Circles indicate individual 
animals, solid lines indicate mean fraction of cells +s.e.m. 


© 2018 Springer Nature Limited. All rights reserved. 


ARTICLE 


a » Population state for 10 s bin Black walls White walls e Single day Consecutive day-pair Cross-experiment day-pair 
i REE 
@ Mean state for 250 s trial Bt BG OWI we 2 F ; 
CAS MEC 9 : ' 
T T r 100 r T t ® 1 1 
24 ' H 
[o} M 
is 1 1 
50 3 2 
20 
° 7 2 0 2 2 0 2 
Z-scored accuracy 
-50 
-100 ° f g 
1 1 1 1 1 1 1 -100 L 1 1 ps >c 1 > 1 
-100 0 100 50 0 50 -50 0 50 38 38 8s 
b LD1 LD1 LD1 ae a g 08 33 08 
= Be ~ ic) $s 06 § 8 0.6 
219 Hee. =? cag = 5 Mec 28 28 o4 2B 04 
8 8 8 BE SE 5 
© 10 © 10 10 3 2 8 202 88 0.2 
§ gs 3 ag o2 o a 0 
B 54\ Bs = [ee BWi2 BW4 Bwi2 BwWa4 
3 acy iets act 
S o So ieteeeres.,.. so) 
weo 5 0 1% 2 “4 o 5 0 1 2 “ o 5 10 15 20 
Principal Component Principal Component Principal Component 
c 
= _ _ h 
= = e 
E40 Eao4 E404 as 1 
© © © @ oO 0.8 
= — — FI 2 0. + 
B20 B20 B 204 @ @ 0.6 
8 8 8 oS + + & 
3 R? = 0.75 3 FP = 0.05 3 Re = 0.01 £204 a e 
2 0 a2 0 20 B35 ogbt ott ttt tetas 
() 20 40 0 20 40 0 20 40 a= ¥. 
Actual time (min) Actual time (min) Actual time (min) Aa £ oO 
10 104 104 LEC MEC CA3 CA2 CA1 
— * Hilla i i Pooled within animal ooled across animas 
(BW12 and BW4) 
10 104 104 35 ']Lec ag | 
8 8 4 8 5 80.84 CA3 $808 
a 0 a 0 a 0 oo MEC oo 
Sw 06 SB 06 
0 20 40 0 20 40 0 20 40 ao y ad 
Time (min) Time (min) Time (min) § = 0.4 Bo & 0.4 
880.2 88 0.2 
as ———— as 0 
d Fold) 34512532411542325134 0 25 50 75 100 100 200 300 
Timebin: i A Number of cells Number of cells 
Epoch 1 Epoch 2 Epoch 3 E A k I 
ae J NS NS 
1st iteration iss 1 a 1 a 1 —aI 
Tain: a Sa5555 § o - + ae sa 3s. gee 
Test: = a - 5 308, rs 5 308, &—* 5 308 
ca § 8 06 $8 06 88 06 
2nd iteration aS See ose iad as ore eee a= cre see 
Tain a a & £04 & g04 §& $04 
Test: = it | 8 S 0.2 3 6 02 8 5 0.2 
sae ao a 0 can) 
5th iteration 
Tan Se ee Eee Sees 
Test: a | BB | Ss 


Extended Data Fig. 4 | See next page for caption. 
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Extended Data Fig. 4 | Decoding of temporal epochs. a, 2D projections 
of neural population responses during the BW12 experiment for trial 
periods only. Axes correspond to the first two linear discriminants (LD1 
and LD2; arbitrary units). Left column shows LEC population responses, 
middle column shows CA3 population responses, right column shows 
MEC population responses, each from an example animal. The wall 
colour of each trial is indicated by a shade of green (black walls) or purple 
(white walls), with progression of shade from dark to light indicating 

the progression of trials. b, Fraction of variance explained by the first 

20 principal components for each area. Principal components were 
computed using PCA on raw data. Lines indicate variance explained 

for individual animals (n = 3, 3 and 2 animals for LEC, CA3 and MEC, 
respectively). c, Regressing the first two principal components from PCA 
results for individual animals against time leads to significant fits for 

all areas, but substantially higher explained variance for LEC. P< 0.001 
(LEC versus CA3, t(4) = 9.79), P< 0.01 (LEC versus MEC, t(3) = 6.13), 
unpaired t-test. Top row, example fits for individual animals are shown 
with black lines indicating time, coloured lines indicating regression fit 
and R? values indicated for the example fit. Bottom two rows, first two 
principal components for example fits. d, Illustration of cross-validation 
procedure: fivefold cross-validation is shown for data containing four 
temporal epochs, with five time bins in each epoch. A different subset 

of time bins is used as test data for each iteration of the cross-validation 
procedure. Actual data consisted of 24 epochs (trial and intertrial periods) 
with 25 or 14 time bins in each epoch, and tenfold cross-validation was 
used. e, Z-scored decoding accuracy using cells recorded in a single day 
(left, P= 0.68, one-sided binomial test, n = 46 days), pairs of consecutive 
days (middle, P= 0.29, one-sided binomial test, n = 72 pairs), pairs 

of days separated by half the total number of recording days (right, 
P=0.37, one-sided binomial test, n = 44 pairs). f, Decoding accuracy for 
temporal epoch using behaviour tracking data in place of neural activity 
(n=3 animals). ‘All tracking data consisted of the animal’s position, 
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velocity, acceleration and head direction. g, Left, decoding accuracy for 
temporal epoch using BW4 data, compared to decoding accuracy using 
BW 12 data. P=0.32 (t(5) = 1.09), unpaired t-test; matched population 
size = 47 cells, n=4 and 3 animals for BW4 and BW12 respectively. Right, 
decoding accuracy for wall colour from the BW4 experiment, compared to 
decoding accuracy using matched data from BW12 experiment. P = 0.35 
(t(5) = 1.03), unpaired t-test; matched population size = 47 cells, n =4 
and 3 animals for BW4 and BW12 respectively. h, Decoding accuracy 

for temporal epoch using BW4 data from the LEC, MEC, CA3, CA2 and 
CA1. One-way ANOVA, F(4) = 20.78, P< 1 x 107°, post hoc Bonferroni 
multiple comparisons test, P< 0.005 (for each comparison against LEC); 
matched population size = 24 cells, n =7, 2, 7, 3 and 3 animals for LEC, 
MEC, CA3, CA2 and CAI respectively. i, Relation between population 
size and decoding accuracy for the LEC, CA3 and MEC. Left, decoding 
accuracy for varying population sizes; each line indicates the curve fit to 
data (shown as points) from one animal, with colours indicating recording 
area (n = 3, 3 and 2 animals for LEC, CA3 and MEC, respectively). Right, 
relation between population size and decoding accuracy for the LEC 

and CA3, pooled across animals from BW12 and BW4 experiments 

(data pooled from n =7 animals for both LEC and CA3). j, Decoding 
accuracy for wall colour from trial period activity alone for the LEC, 

CA3 and MEC. P=0.47 (LEC versus CA3, t(4) = 0.81) unpaired t-test; 
matched population size = 28 cells, n = 3, 3 and 2 animals for LEC, CA3 
and MEC, respectively. k, Decoding accuracy for wall colour using data 
that was shuffled in time. P= 0.74 (t(2) =0.38), two-tailed paired t-test; 
n=3 animals. 1, Decoding accuracy for wall colour using a subpopulation 
with all wall-colour-selective cells removed, compared to size-matched 
populations that were randomly drawn from full population. P= 0.12 
(t(2) = 2.66), paired t-test; n =3 animals with 77, 126 and 195 cells not 
selective for wall colour. For f, g, j-l, circles indicate individual animals, 
solid lines indicate mean decoding accuracy + s.e.m., dashed lines indicate 
chance levels. 
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or MEC (gold, n = 2 animals) activity, quantified by applying several model. Red line indicates model fit. d, Manhattan distance between 
distance measurements to population activity states (see Supplementary consecutive pairs of population states across trial or intertrial periods. 
Information). The distance being measured is illustrated in cartoon e, Pairwise angles, measured across consecutive points in time along 
form above each plot (d;: distance between time bin 1 and time bin 2, neural trajectories during trial or intertrial periods as the angle between 
d>: distance between time bin 2 and time bin 3, and so on). a, Manhattan two vectors, each defined as the difference of consecutive population 
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intertrial period and all other time bins within that period. b, Manhattan the first trial or intertrial period and the overall mean population states of 
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period. c, Left, Manhattan distance between the population state for the z-scored. 
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Extended Data Fig. 6 | Decoding shortened temporal epochs. 

a, Decoding accuracies for different temporal epoch lengths across the 
LEC, MEC, CA3, CA2 and CA1 using BW4 data (n =7, 2, 7, 3 and 3 
animals for LEC, MEC, CA3, CA2 and CAI, respectively). Decoding 
accuracy for 20-s epochs using LEC data was significantly better than 

all hippocampal areas. P< 1 x 107° (F(17) = 20.69) one way ANOVA, 
post hoc Bonferroni multiple comparisons test; P < 0.05 (LEC versus 
MEC), P< 0.001 (LEC versus CA3, CA2 or CA1); matched population 
size = 24 cells. Decoding accuracy was higher for the LEC and MEC 
than for the CA3 for 10-s epochs. P< 0.001 (LEC versus CA3, all other 
comparisons were not significant, F(17) = 8.00), one-way ANOVA, post 
hoc Bonferroni multiple comparisons test; matched population size = 24 
cells. Decoding accuracy was similar across all areas for 1-s epochs. 
P=0.05 (F(17) =2.95), one-way ANOVA. b, Confusion matrix from an 
example LEC animal for 10-s epochs. The matrix contains 468 epochs 
(each trial period of 250 s divided into 25 10-s epochs, each intertrial 
period of 140 s divided into 14 10-s epochs, 39 epochs per trial/intertrial 
pair, 12 total pairs across the session, giving 468 epochs). Left, confusion 
matrix for the entire session. Right, sum across trial/intertrial sections 
of the whole session matrix (outlined by dashed line in whole session 
matrix). c, Decoding accuracy for re-binned confusion matrices, see 
Supplementary Information (circles and solid lines with black indicates 
mean, shade indicates the same individual; P< 1 x 107+ (comparing across 
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5 10 15 20 25 

Predicted epoch 
epoch lengths, F(3) = 32.28) one-way ANOVA; P= 0.12 (trial versus 20 s), 
P<0.001 (trial versus 10 s, and trial versus 1 s)) compared to decoding 
accuracy for re-binned confusion matrices following shuffling along 
columns, with diagonals preserved (triangles and dashed lines, with black 
indicating mean and shade indicating individual animals; comparing 
decoding accuracies from shuffled and unshuffled confusion matrices: 
two sided paired t-tests, for 20 s: t(2) = 7.42, P< 0.05; for 10 s: (2) = 14.31, 
P<0.01; for 1 s: (2) =4.46, P< 0.05). Grey dashed line indicates chance 
at 4.2%, shade indicates individual animals. d, Decoding accuracy for 
different temporal epoch lengths for the LEC (matched population 
size = 90 cells, n = 3 animals), based only on activity within individual 
trial or intertrial periods (top or bottom, respectively). Decoding accuracy 
was measured for each trial or intertrial period individually, and then 
the average was taken across all trials or intertrial periods, respectively, 
for each animal. Temporal bin size was: 1 s for 20-s-long epochs, 500 ms 
for 10-s-long epochs, and 50 ms for 1-s-long epochs. Chance levels for 
trial periods were 8.3%, 4.0% and 0.4%, and 14.3%, 7.1% and 0.7% for 
intertrial periods. e, Confusion matrices from an example LEC animal for 
single-trial or intertrial periods as in d, using 20-s (top) and 10-s (bottom) 
epochs, with intertrial results on the left and trial results on the right. 
For a and d, circles indicate individual animals, solid lines indicate mean 
decoding accuracies + s.e.m., dashed lines indicate chance levels. 
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Extended Data Fig. 7 | Temporal coding within a fixed environmental 
context. a, Experimental design: 10 min without object, 10 min with 
object, and 10 min without object. b, Predictors used for object GLM: 
object presence, trial time and session time (24.0% selective for session 
time, 4.8% selective for trial time, 4.3% selective for a mixture of trial time 
and session time, percentages averaged across all animals). c, Explained 
variance for all LEC cells fit by GLM for object experiments (n = 150 cells). 
Average explained variance was 0.05. d, Distribution of selectivity for time, 
object and mixtures of time and object for single LEC cells (n = 3 animals 
with 56, 57 and 150 cells). Shade indicates the same individual across the 
different variables. e, Examples of GLM fit results for 12 cells from the 
object experiment with selectivity for different features. The firing rate 
of each cell is shown in grey, with the model-predicted firing rate in 


blue. The R? value is shown for each cell. f, 2D projection of LEC neural 
population responses during object experiment from example animal. 

g, Decoding accuracy for trial identity or object presence (n = 3 animals). 
h, Decoding accuracies for temporal epochs of shortened length (n = 3 
animals, matched population size = 56 cells). i, Decoding accuracy for trial 
identity (left) or object presence (right), with cells selective for decoded 
variable removed, compared to size-matched populations randomly drawn 
from full population. Trial identity: P < 0.05 (t(2) =4.95), paired t-test; 
n= 29, 26 and 79 cells not selective for time; object presence: P< 0.05 

(t(2) =7.17), paired t-test; n = 39, 47, and 110 cells not selective for 

object presence. Circles indicate individual animals, solid lines indicate 
mean +s.e.m. of described measurement, dashed lines indicate chance 
levels. 
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Extended Data Fig. 8 | Explicit versus inherent mechanisms for 
temporal coding. a, Top, a series of experiences occurs, each containing 
different event content and spanning different amounts of time. Bottom, 
two different ways in which temporal information within this series of 
experiences may be encoded. For both cases, a population code is used, 
but this may just as easily be replaced by a rate code within single cells. 
An explicit mechanism (left) purposely represents the passage of time, 
such that each chunk of time is represented equally. Thus, two experiences 
with the same temporal length but differing numbers of events would 
correspond to the same change in activity. An inherent mechanism (right) 
encodes temporal information entirely by representing the events within 
each experience. Thus, two experiences with the same temporal length 
but differing numbers of events would correspond to differing changes 

in activity. In either case, the high dimensionality of the representations 
would allow temporal information to be read out easily by downstream 
readout neurons, for example, cells in the hippocampus. b, As in a, but 

in this example, instead of a series of different experiences, the same 
experience is repeated three times (analogous to performing a learned 
task three times). Here, an explicit mechanism for temporal coding would 
exhibit the same amount of change in activity as in a, whereas an inherent 
mechanism would exhibit considerably reduced differences in activity 
across the experiences. 
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Extended Data Fig. 9 | Decoding trial identity with additional matched 
data. a, Euclidean distance between mean LEC population states for 

pairs of adjacent trials with different or same wall colour. P< 10° 

(¢(31) =8.81), unpaired t-test; n = 12 and 21 for same colour (BB/WW) 
and different colour (BW/WB) transitions, respectively, pooled from three 
animals. b, Decoding accuracy for trial identity during the figure-eight 
experiment, compared to decoding accuracy for temporal epoch using 
matched data from BW experiments that did not account for intertrial 
intervals or trial type. P< 0.01 (t(8) =4.91), unpaired t-test; matched 
population size = 31 cells, n =3 and 7 animals for figure-eight and BW, 
respectively. c, Circular-track task, in which animals alternated between 
clockwise and anticlockwise runs for 15 consecutive back and forth laps. 
Black circle indicates midpoint of the track. d, Left, 2D projection of 

the LEC neural population response during circular track experiment 
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from example animal. Right, 2D projection of the LEC neural population 
response during matched periods from BW experiments. e, Left, decoding 
accuracy for trial identity during circular track experiment compared 

to decoding accuracy for temporal epoch using matched data from 

BW experiments. P < 0.0001 (t(7) = 6.21), unpaired t-test; matched 
population size = 47 cells, n=2 and 7 animals for circular track and BW, 
respectively. Right, as in the left panel, but using temporally consecutive 
data that did not account for intertrial intervals or running direction. 
P<0.05 (t(7) = 3.17), unpaired t-test; matched population size = 47 cells, 
n=2and 7 animals for circular track and BW, respectively. f, Confusion 
matrix for decoding trial identity in the circular track experiment from 
example animal. Circles indicate individual animals, solid lines indicate 
mean-+s.e.m. of described measurement, dashed lines indicate chance 
levels. 
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Extended Data Fig. 10 | Additional characterization of LEC activity 
during figure-eight task. a, Manhattan distance between consecutive 
pairs of population states (binned in 500-ms time bins) across single trials, 
averaged across all animals. b, Predictors used for the figure-eight GLM: 
trial type, trial time and session time (1.3% selective for session time, 9.3% 
selective for trial time, 6.5% selective for a mixture of trial time and session 
time, percentages averaged across all animals). c, Explained variance 

for all LEC cells fit by GLM for the figure-eight experiment (n = 149 

cells). Average explained variance was 0.10. d, Distribution of selectivity 
for time, trial type and a mixture of time and trial type for single LEC 
cells, determined using a GLM (n =3 animals with 72, 76 and 31 cells). 
Circles indicate individual animals, solid lines indicate mean fraction 

of cells + s.e.m., shade indicates the same individual across the different 
variables. e, Examples of GLM fit results for six cells from the figure-eight 
experiment with selectivity for different features. The firing rate of each 


ape NaS 
eed 


cell is shown in grey, with the model-predicted firing rate in blue. 

R? value is shown for each cell. f, LEC activity for 12 example cells during 
the figure-eight task. Each plot shows the mean firing rate (top, 95% 
percentile confidence interval shaded), and peristimulus time histograms 
for left-turn (middle) and right-turn trials (bottom), with time centred on 
the point at which the animal reaches the base of the central stem on each 
trial. Cells 7-12 exhibited similar firing patterns for both left- and right- 
turn trials, including during the first 3 s of the trial, in which the animal 
occupied a different spatial location for left- versus right-turn trials. Such 
activity may be used for temporal information. Cells 13-18 exhibited 
highly divergent firing patterns for left- and right-turn trials, which may 
reflect the animal's spatial location, the behavioural context of the trial, or 
a combination of the two variables. All example cells exhibited relatively 
stable firing across trials, a common feature observed during the figure- 
eight task. 
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Role of glutamine synthetase in 
angiogenesis beyond glutamine synthesis 
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Francisco Morales-Rodriguez!*, Bert Cruys!, Lucas Treps)?, Leanne Ramer! *°, Stefan Vinckier!*, Katleen Brepoels!*, 
Sabine Wyns?*, Joris Souffreau>?, Luc Schoonjans!?, Wouter H. Lamers®, Yi Wu’, Jurgen Haustraete!""", Johan Hofkens’, 
Sandra Liekens”, Richard Cubbon!*”°, Bart Ghesquiére!?, Mieke Dewerchin!?, Francesco L. Gervasio®“, Xuri Li?*, 
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Glutamine synthetase, encoded by the gene GLUL, is an enzyme that converts glutamate and ammonia to glutamine. 
It is expressed by endothelial cells, but surprisingly shows negligible glutamine-synthesizing activity in these cells at 
physiological glutamine levels. Here we show in mice that genetic deletion of Glul in endothelial cells impairs vessel 
sprouting during vascular development, whereas pharmacological blockade of glutamine synthetase suppresses 
angiogenesis in ocular and inflammatory skin disease while only minimally affecting healthy adult quiescent endothelial 
cells. This relies on the inhibition of endothelial cell migration but not proliferation. Mechanistically we show that in 
human umbilical vein endothelial cells GLUL knockdown reduces membrane localization and activation of the GTPase 
RHOJ while activating other Rho GTPases and Rho kinase, thereby inducing actin stress fibres and impeding endothelial 
cell motility. Inhibition of Rho kinase rescues the defect in endothelial cell migration that is induced by GLUL knockdown. 
Notably, glutamine synthetase palmitoylates itself and interacts with RHOJ to sustain RHOJ palmitoylation, membrane 
localization and activation. These findings reveal that, in addition to the known formation of glutamine, the enzyme 
glutamine synthetase shows unknown activity in endothelial cell migration during pathological angiogenesis through 


RHOJ palmitoylation. 


Endothelial cells (ECs) line the lumen of blood vessels. Emerging evi- 
dence reveals that EC metabolism controls vessel sprouting (angio- 
genesis)'~*. Although glutamine catabolism in ECs has been recently 
characterized‘, it is not known whether glutamine anabolism controls 
angiogenesis in vivo. Glutamine is a carbon and nitrogen donor for 
the production of biomolecules and is involved in redox homeostasis. 
Most cells take up glutamine, and therefore do not need to synthesize 
it. However, certain cell types express GLUL (glutamate-ammonia 
ligase), the gene that encodes the enzyme glutamine synthetase (GS), 
which is capable of de novo glutamine production from glutamate and 
ammonia in a reaction that requires ATP and Mg** or Mn?*. GS also 
serves another biochemical function—the clearance of ammonia— 
but this is best described for hepatocytes, astrocytes and muscle. ECs 
also express GS°, although its role and importance in angiogenesis 
is unclear given that ECs are exposed to high plasma glutamine lev- 
els. Global deficiency of GS causes embryonic lethality, presumably 
owing to the inability to detoxify ammonia®. GS deficiency in humans 
is extremely rare and leads to multi-organ failure and infant death’. 
Whether and how GS affects angiogenesis has, to our knowledge, not 


yet been analysed. Here we characterized the role and importance of 
GS in vessel sprouting. 


Vessel sprouting requires endothelial GS 

We checked the expression of GS in endothelial cells of the retinal 
microvasculature with a genetic Glul reporter mouse (Glult/°¥? mice 
with a nucleus-targeted GFP-lamin A fusion reporter transgene in the 
Glul open reading frame of one allele®). GFP tracing in the post-natal 
day (P)5 retinal plexus, co-stained with the endothelial cell marker 
isolectin B4 (IB4; red), revealed endothelial expression of GFP (and 
therefore expression of GS) in the microvasculature (Fig. 1a). 

Human umbilical vein ECs (HUVECs) expressed GLUL at sim- 
ilar levels to human colon ECs, liver ECs, human umbilical artery 
ECs and blood outgrowth ECs, but at a lower level than lung ECs 
(Extended Data Fig. 1a). However, the expression of GS in ECs or 
isolated mouse liver ECs was lower than in HEPG2 hepatocellular 
carcinoma cells or astrocytes (Extended Data Fig. la—c), which are 
known to highly express GS. Glutamine withdrawal (below the phys- 
iological concentration of 0.6 mM) increased GS protein levels in ECs 
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Fig. 1 | EC-specific deletion of GS causes vascular defects in vivo. 

a, GS expression (arrowheads) in the retinal microvasculature (co-stained 
with IB4) of P5 chimaeric Glul*/°? pups. The area indicated by the white 
box is shown magnified on the right. ES cells, embryonic stem cells. b, GS 
protein levels in HUVECs under different concentrations of extracellular 
glutamine. c, Glul mRNA levels upon activation of VE-cadherin-cre®®", 
d-g, IB4 staining of P5 retinal vascular plexuses from wild-type (WT) (d) 
and Glul’=-*° (e) mice (with magnifications shown in the insets; A, artery; 
V, vein) and quantification of branch points at the front of the plexus 

(f) and radial expansion of the plexus (g). h, Vessel regression (area of 
collagen IV (Col IV)*IB4~ vessel sleeves as a percentage of total Col IV* 
area) in retinas from P5 wild-type and Glul’®“*° pups. i, j, Quantification 
of distal sprouts (i) and filopodia (j) at the retinal vascular front. k-m, 
1B4 (grey)/EdU (cyan) double staining of P5 wild-type (1) and Glul’=CX° 
(m) retinas (magnifications are shown in the insets, with arrowheads 


(Fig. 1b, Extended Data Fig. 1b), as has been previously documented 
for other cell types’. 

We intercrossed Glul’*"** mice with two different EC-specific 
tamoxifen-inducible Cre driver lines— VE-cadherin(PAC)-cre#®"? 
and Pdgfb-cre®®! mice—to obtain Glul’®“®° and GlulP#CX° mice, 
respectively (in which ECKO indicates endothelial cell knockout; 
VE-cadherin is also known as Cdh5). Correct recombination of the 
excised Glul allele was confirmed (Extended Data Fig. 1d, e) and caused 
an average reduction of 84% in Glul mRNA levels in mouse liver ECs 
isolated from Glul’®°*° mice (Fig. 1c). In the neonatal retina, vascular 
plexuses in P5 Glul’#*° mice showed hypobranching and reduced 
radial expansion, whereas vessel coverage by NG2* pericytes (NG2, 
chondroitin sulfate proteoglycan 4) and vessel regression (number of 
empty collagen IV* sleeves) were unaffected (Fig. 1d-h, Extended Data 
Fig. 1f, g). However, the numbers of filopodia at the vascular front and 
of distal sprouts with filopodia, both parameters of EC migration, were 
lower in Glul’#CX° mice (Fig. 1i, j). Furthermore, the complexity of the 
vasculature at the utmost leading front of the plexus was decreased, 
as determined by counting the number of branches in distal sprouts 
(Extended Data Fig. 1h). By contrast, quantification of IB4+EdU* cells 
(EdU, 5-ethynyl-2’-deoxyuridine) revealed no difference in the number 
of proliferating ECs (Fig. 1k-m, Extended Data Fig. 1i). Hypobranching 
was also observed in the dorsal dermal blood vasculature in embryonic 
day (E)16.5 Gly]vECKO embryos (Fig. 1n-r). A similar retinal phenotype 
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denoting EdU* ECs) and quantification (k) of EdU* ECs at the front of 
the plexus. n-r, CD31-stained dermal dorsal blood vasculature in E16.5 
wild-type (n, 0) and Glul’®C*° (p, q) mice with boxed regions magnified 
in o and q, and quantification of the number of branch points (r). All data 
are mean + s.e.m., with individual data points shown in c, f-k and r;n=2 
individual experiments (a, b); values of n (individual mice) for wild-type 
and Glul’#CX®, respectively, are: 3 and 3 (c); 11 and 10 (f); 10 and 7 (g); 4 
and 6 (h); 18 and 22 (i); 17 and 21 (j); 12 and 22 (k); 5 and 15 (1), from 2 
(g, h, r), 3 (f), 4 (k) or 5 (i, j) litters. NS, P> 0.05, *P <0.05 according to 
Student’s t-test (c, g, h, i, j, k, r) or mixed-models R statistics (f). Exact 

P values are as follows: 0.0215 (c); 0.0141 (f); 0.0063 (g); 0.4902 (h); 
0.0009 (i); 0.0484 (j); 0.3837 (k); 0.0046 (r). Scale bars: 10 1m (a right), 
50m (a left), 100 1m (1, m), 200m (d, e, n, p). For gel source images, see 
Supplementary Fig. 1. 


was observed in GlulP2-*° mice (Extended Data Fig. 1j-m). Therefore, 
loss of endothelial GS causes vascular defects by impairing EC migra- 
tion but not proliferation. 

The retinal vascular defect was restored over time (Extended Data 
Fig. 1n-u), and at six weeks, Glul’”ECX© mice (with Glul deleted in ECs at 
P1-P3) did not show overt vascular defects (Extended Data Fig. 1v—ag). 
Glul”*°®° mice gained normal body weight, and blood biochemistry 
and haematological profiles were normal at six weeks (Extended Data 
Table 1). Vascular restoration may relate to the possibility that homo- 
zygous mutant ECs were outcompeted over time by residual wild-type 
ECs, in which recombination did not occur (as documented in mice 
with endothelial knockout of other key metabolic genes”) or because 
of other compensatory adaptations. Alternatively, the results raise the 
question of whether the effect of endothelial GS loss may be larger in 
growing (motile) ECs during vascular development than in quiescent 
(non-motile) ECs during adulthood in healthy conditions. 

We then explored whether pharmacological blockade of GS with 
methionine sulfoximine (MSO), which irreversibly blocks the cata- 
lytic activity of GS, reduced pathological angiogenesis. First, in the 
oxygen-induced model of retinopathy of prematurity (ROP)*°, the 
treatment of pups with MSO reduced the formation of pathological 
vascular tufts (Fig. 2a—c), while modestly increasing the vaso-oblite- 
rated area (Fig. 2d, Extended Data Fig. lah, ai). Second, we used the 
corneal micro-pocket assay, with slow-release pellets containing basic 
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Fig. 2 | GS inhibition mitigates pathological angiogenesis. a-d, Retinal 
flat-mounts of ROP mice treated with vehicle (a) or 20 mg kg! day“! 
MSO (b). Quantification of vascular tuft (c) and vaso-obliterated area 

(d) in control and MSO-treated ROP mice. e-g, Quantification (e) of 
CD31* (green) neo-vessels in corneal flat-mounts from mice in corneal 
pocket assays with bFGF pellets (demarcated by a dotted white line) with 
vehicle (f) or MSO (g). h-1, CD105 staining of untreated skin (h), IMQ- 
treated skin (i), IMQ + low-dose MSO-treated skin (j), IMQ + high-dose 
MSO-treated skin (k) and quantification of CD105* area (1). All data are 
mean +s.e.m., with individual data points shown for c, d, e and |; values 


fibroblast growth factor (bFGF) as a mouse model of corneal neovas- 
cularization. Inclusion of MSO in the pellet reduced the formation of 
new CD31* blood vessels in the otherwise avascular cornea (Fig. 2e-g). 
Finally, we used an imiquimod (IMQ)-based mouse model of inflam- 
mation-driven skin psoriasis, and found a marked dose-dependent 
reduction of the CD105* EC area upon topical treatment of the affected 
skin with MSO (Fig. 2h-). Therefore, pharmacological GS blockade 
inhibits pathological angiogenesis in the inflamed skin and in several 
eye disorders. 


Silencing GLUL reduces EC migration 

We then used GLUL knockdown (GLUL*”) HUVECs (mediated by 
short hairpin RNA (shRNA); >80% silencing; Extended Data Fig. 2a) 
in in vitro spheroid-sprouting assays to assess vessel sprouting. GLUL 
ECs showed a reduced number of sprouts per spheroid and a reduc- 
tion in the total sprout length (Fig. 3a, b, e, f). Re-introduction of a 
shRNA-resistant GLUL (rGLUL”; in which r indicates shRNA-resist- 
ance and OE indicates overexpression) rescued the sprouting defect 
(Extended Data Fig. 2b, c). The sprouting defect in GLULK? spheroids 
was maintained upon mitotic inactivation of ECs with mitomycin C 
(Fig. 3c-f), which further suggests a defect in EC motility. In agree- 
ment with this, at physiological glutamine levels GLUL knockdown 
did not affect EC proliferation (Fig. 3g). The sprouting defect was also 
not due to reduced EC viability or increased oxidative stress, or to 
changes in energy charge, glutathione or NADPH levels, glycolysis, 
glucose or glutamine oxidation, or oxygen consumption (Extended 
Data Fig. 2d-m). 

GLUL*? ECs showed impaired migration in scratch-wound and 
Boyden chamber assays, even upon treatment with mitomycin C, an 
effect that was rescued by re-introducing the shRNA-resistant GLUL 
(rGLUL”) (Fig. 3h, i). Furthermore, sparsely seeded GLUL¥? ECs 
had a reduced velocity of random movement (Fig. 3j, Supplementary 
Videos 1, 2) and a decreased lamellipodial area (Fig. 3k-m). 
Comparable results were obtained with a second non-overlapping 
shRNA and a GLUL-specific small interfering RNA (siRNA) (Extended 
Data Figs. 2a, 3a-e). 

The migration defects suggested that the remodelling of the actin 
cytoskeleton, which is necessary for cellular motility, was perturbed 
in GLUL*” ECs. Notably, we detected an increase in F-actin levels 
in GLUL®? ECs (Fig. 3n). A role for GS in cytoskeletal remodelling 
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of n (individual mice) for control and MSO-treated are: 7 and 6 (c, d), 

10 and 11 (e) from 3 litters (c, d) and 2 experiments (e). In], n= 15 mice 
for control, n = 22 for IMQ, n= 18 for IMQ + MSO low-dose (indicated 
by +), and n=6 for IMQ + MSO high-dose (indicated by ++) from 3 
experiments. NS, P > 0.05, *P < 0.05 according to Student's t-test (c, d, e) 
or one-way analysis of variance (ANOVA) with Dunnett’s multiple 
comparisons versus IMQ (I). Exact P values are as follows: 0.0459 (c); 
0.0145 (d); <0.0001 (e); control versus IMQ: 0.0278; MSO low versus 
IMQ: 0.7283; MSO high versus IMQ: 0.0451 (1). Scale bars: 100 1m (a, b), 
200 um (f, g), 75,4m (h-k). 


was further suggested by analysing the repolymerization of the actin 
cytoskeleton upon disruption with the F-actin polymerization inhibitor 
latrunculin B and subsequent wash-out. Latrunculin B perturbed the 
normal morphology of control and GLUL*” ECs (Fig. 30-r). After 
wash-out, when control cells had rebuilt a normal actin cytoskeleton, 
GLUL* ECs still had higher F-actin levels, which mainly originated 
from increased numbers of stress-fibre bundles (Fig. 3s—u). GLUL*? 
ECs did not display altered «-tubulin levels (Fig. 3v, Extended Data 
Fig. 4a-h). 

The increase in F-actin levels was also present in ECs that were 
freshly isolated from MSO-treated mice (Extended Data Fig. 4i-k), and 
in confluent GLUL*? ECs aligning a scratch wound in vitro (Extended 
Data Fig. 4l-n). Confluent monolayer GLUL® ECs displayed compro- 
mised junctional integrity (Extended Data Fig. 40-v). Functionally, 
this corresponded to a decrease in the transendothelial electrical resist- 
ance of GLUL*” ECs in vitro (Extended Data Fig. 4w) and increased 
leakiness of inflamed (but not healthy) vessels in vivo (Extended Data 
Fig. 4x-z). 


Glutamine production by endothelial GS 

To explore whether the migration defect was attributable to reduced 
de novo glutamine synthesis, we measured the glutamine-synthesizing 
activity of GS by supplementing ECs with '"NH,4Cl (Extended Data 
Fig. 5a). At a physiological concentration of 0.6 mM glutamine or 
higher, the glutamine-producing activity of GS was negligible, at 
approximately the level observed in ECs treated with MSO. It slightly 
increased only upon glutamine withdrawal, presumably to compen- 
sate for the lack of available glutamine (Fig. 4a). Similar results were 
obtained in a medium containing dialysed serum (Extended Data 
Fig. 5b). For further details see Supplementary Discussion 1 and 
Extended Data Fig. 5c-n. 

To determine whether the phenotype of GLUL®® ECs relied on the 
catalytic site of GS, we used MSO—which is an irreversible inhibi- 
tor of GS by competing with glutamate for binding at the catalytic 
site—at previously reported concentrations’®. MSO reduced EC 
spheroid sprouting, impaired EC migration in scratch-wound assays 
under treatment with mitomycin C, and decreased lamellipodial area, 
while increasing F-actin levels after latrunculin B wash-out but not 
affecting EC proliferation (Extended Data Fig. 50-t). Even though 
other (off-target) effects of pharmacological GS inhibition cannot be 
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Fig. 3 | Loss of GLUL impairs EC migration through perturbed actin 
dynamics. a—f, Control (a, c) and GLUL*® (b, d) EC spheroids without 
(a, b) and with (c, d) mitomycin C (MitoC) treatment, and number of 
sprouts per spheroid (e) and total sprout length (f). g, Proliferation of 
control and GLULX” ECs, as measured by [H]thymidine incorporation. 
h, Wound closure upon treatment of control and GL ULK” ECs with 
mitomycin C. i, Boyden chamber migration for control, GLUL®? and 
GLUL* + rGLUL™ (overexpression of a shRNA-resistant GLUL mutant) 
ECs, all under mitomycin C treatment. j, Velocity of sparsely seeded 
control and GLUL* ECs. k-m, Phalloidin (F-actin) staining of control (k) 
and GLUL*® (1) ECs (arrows and white dotted lines indicate lamellipodia) 
and quantification of lamellipodial area (m). n-p, F-actin and G-actin 
levels in phalloidin (F-actin)-DNase I (G-actin) double-stained control 
and GLUL*? ECs (n), and representative images of phalloidin-stained 
control (0) and GLUL*? (p) ECs. q-u, Phalloidin staining of latrunculin 
B-treated control (q, s) and GLUL*® (r, t) ECs at time point 0 (q, r) and 
at 1 h after latrunculin wash-out (s, t) and quantification of F-actin levels 
after wash-out (u). v, a-Tubulin levels in GLUL*? and control ECs. All 
data are mean + s.e.m., with individual data points shown for e-j, m, n, 
uand y; values of n (independent experiments) are: 4 (e, f), 9 (g, j), 5 (h), 
6 (i, u), 7 (m) and 3 (n, v). NS, P> 0.05, *P <0.05 according to mixed- 
models R statistics (e, f), Student’s t-test (g, h, j, m, n, u, v) or one-way 
ANOVA with Dunnett’s multiple comparison versus control (i). Exact 

P values are as follows: control versus GLUL*? + mitoC: <0.0001(e, f); 
0.7729 (g); 0.0283 (h); control versus GLUL®: 0.0093; control versus 
GLUL* + rGLUL™: 0.5981 (i); 0.0234 (j); 0.0352 (m); F-actin: 0.0467; 
G-actin: 0.584 (n); 0.0007 (u); 0.3491 (v). AU, arbitrary units x 10°. Scale 
bars: 100 1m (a-d), 10m (k, 1) and 201m (0-t). 


formally excluded, MSO phenocopied the GLUL knockdown, which 
suggests that the catalytic site of GS is indispensable in the control of 
EC cytoskeletal homeostasis. 


GS inhibition affects RHOJ activity 
Because small GTPases and their effectors control F-actin levels and 
motility! we explored whether Rho GTPases were downstream targets 
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of GS. We focused on RHOJ, because it is mainly expressed in ECs’? 
and because the blocking of endothelial RHOJ has been proposed as a 
novel anti-angiogenesis approach’. It is noteworthy that RHO}? ECs 
fully phenocopied GLUL*” ECs in terms of decreased mobility and 
barrier function (data not shown). 

Because RHOJ localizes to plasma and organelle membranes to 
become activated'4, and is almost exclusively detected in the mem- 
brane fraction!°, we investigated whether its membrane localization 
and activity were regulated by GS levels. Immunoblotting revealed that 
RHOJ was detectable only in the membrane fraction of ECs (consistent 
with previous findings!>), and that GLUL® ECs had decreased 
amounts of RHOJ in the membrane fraction (without concomitant 
increase in the cytosolic fraction, possibly because of proteasomal 
degradation’*) as well as decreased levels of active RHOJ (Fig. 4b, c). 
GLUL knockdown did not clearly affect RHOJ transcript levels (rela- 
tive mRNA levels: 0.99 + 0.03 in control compared with 0.85 + 0.05 in 
GLUL®; n=3, P=0.0282). 

We also explored whether GLUL knockdown affected other Rho 
GTPases in ECs. We focused on the RHOA/B/C-Rho kinase (ROCK)- 
myosin light chain (MLC) axis, because silencing of endothelial RHOJ 
increases signalling of this pathway and induces aberrant F-actin 
stress-fibre formation through an as-yet-undefined mechanism!*’” 
(Fig. 4d). Standard glutathione S-transferase-Rhotekin pull-down 
assays showed that GLUL knockdown increased the activity of RHOA 
and RHOC, but not of RHOB (Fig. 4e-g). Of note, GLUL knockdown, 
much like other stimuli, increased total RHOB levels. We confirmed 
the increase in RHOA activity at the individual-cell level with a 
DORA-RHOA-FRET (DORA, dimerization optimized reporter for 
activation; FRET, fluorescence resonance energy transfer) biosensor 
(Fig. 4h, Extended Data Fig. 6a). We observed that the abnormally 
elevated RHOA activity in retracting lamellipodia in GLUL* ECs 
evoked more numerous, but smaller and more short-lived, lamellipodia 
(Fig. 4i), which could contribute to the motility impairment. As pre- 
viously suggested’*, increased RHOA activity in lamellipodia leads 
to local actomyosin contraction through ROCK and phosphorylated 
MLC (pMLC), thereby prematurely retracting the lamellipodium. 
Combining both GLUL and RHOJ knockdown did not further increase 
RHOA activity (data not shown); this confirms that RHOJ silencing by 
itself increased RHOA activity, and suggests that GS primarily acts via 
RHOJ to control RHOA signalling. 

Downstream of Rho GTPases, GLUL®? and MSO-treated ECs had 
increased ROCK] and ROCK2 protein levels (Fig. 4j) and enhanced 
ROCK activity, as determined by pMLC protein levels, which were 
similarly induced in both GLUL*? and RHOJ*? ECs (Fig, 4k, Extended 
Data Fig. 6b-n). In agreement with this, ROCK inhibitors (Y27632, 
fasudil hydrochloride and H1152 dihydrochloride (data not shown)) 
rescued the phenotype associated with GLUL® (Fig. 41-0, Extended 
Data Fig. 60-w), whereas myosin light-chain kinase inhibitors (ML7; 
peptide 18) did not (Extended Data Fig. 6x-aa). This suggests that 
MLC phosphorylation through ROCK rather than through myosin 
light-chain kinase is more important in mediating the phenotype asso- 
ciated with GLUL*” in ECs. Therefore, GLUL knockdown reduces the 
membrane localization and activity of RHOJ, while activating RHOA, 
RHOC and ROCK. 

We next explored which of these Rho GTPases interact with GS, 
assuming that such an interaction might facilitate and/or be necessary 
for their activation; we remained mindful, however, that RHOJ can 
negatively regulate the activity of the RHOA-ROCK-MLC axis’*””, 
and hence the loss of a primary interaction of GS with RHOJ could 
indirectly explain the increased levels and/or activity of RHOA, 
ROCK and pMLC upon GLUL knockdown. First, co-immunoprecip- 
itation assays revealed an interaction between endogenous RHOJ and 
GS (Fig. 5a); such an interaction was not observed for RHOA or for 
RHOG, which is the most abundant Rho GTPase in ECs (Extended 
Data Fig. 7a). Second, deletion of the first 20 N-terminal amino acids 
of RHOJ (AN20-RHOJ), which mediate its plasma-membrane localiza- 
tion)’, reduced the interaction with GS (Extended Data Fig. 7b). Third, 
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Fig. 4 | Endothelial GS regulates Rho GTPase activity. a, Effect of 
glutamine and MSO on glutamine-producing activity, measured as the 
percentage enrichment of M + 1 (singly '°N-labelled) glutamine (Gln) 
and glutamate (Glu), 30 min after the addition of "NH,*. b, RHOJ, NaK 
ATPase (membrane marker) and GAPDH (cytosol marker) immunoblots 
in cytosolic (c) and membrane (m) fractions with quantification. 

c, Immunoblot for active and total RHOJ with quantification; RH oj®?, 
beads only (Beads) and irrelevant biotinylated peptide (Irr. biotin. pep.) 
are negative controls. d, The pivotal yet incompletely understood role 

of RHOJ in EC migration and stress-fibre formation. e-g, Immunoblots 
for pull-down RHOA (e), RHOB (f) and RHOC (g) activity assays with 
quantification. h, Control and GLUL®? ECs expressing the DORA- 
RHOA biosensor, with quantification of whole-cell FRET start ratio 
(mean +s.e.m.; control, n= 12 cells; GLUL®”, n=9). Look-up table 
(colour bar) denotes relative RHOA activities (blue, low; red, high). 

i, Kymograph of DORA-RHOA biosensor-expressing ECs, showing 
abnormally short-lived lamellipodia and increased RHOA activity in 
retracting lamellipodia of GLUL*® ECs (red arrowheads) (representative 
of 13 control and GLUL®? cells). j, Immunoblots of ROCK1 (left), ROCK2 
(right) and a-tubulin (a-Tub), with quantification. k, pMLC, total MLC 
and q-tubulin immunoblots (for quantification, see Methods). In this 


immunoblotting showed that only RHOJ, and not RHOA or RHOC, 
was predominantly membrane-localized (Extended Data Fig. 7c). 
Fourth, we confirmed the GS-RHOJ interaction with a bimolecular 
fluorescence complementation approach (Extended Data Fig. 7d, e). 
On the basis of the above data, we focused on RHOJ as the most likely 
interaction partner of GS. 

To interact with membrane-localized (active) RHOJ, GS should 
be similarly located; indeed, cell fractionation studies revealed that a 
fraction of GS was membrane-localized (Fig. 5b). Further evidence 
is derived from single-particle tracking data, acquired by photoacti- 
vated localization microscopy imaging (SPT-PALM) combined with 
total internal reflection fluorescence microscopy (TIRF). We traced the 
movement of single GS proteins that were tagged with the photoswitch- 
able fluorescent protein mEOS (GS—mEOS). Single GS-mEOS particles 
had a lower diffusion coefficient in the TIRF region (comprising the 
plasma membrane and the immediately adjacent cytoplasm) than did 
free mEOS, which is indicative of an association of GS with membrane 
structures (Fig. 5c, Extended Data Fig. 7f). 


panel, parenthetical ‘(c)’ denotes ‘corrected for corresponding loading 
control. 1, F-actin levels after latrunculin B wash-out in ECs treated with 
the ROCK inhibitor Y27632. m-o, Effect of Y27632 on spheroid-sprouting 
defect (m), migration defect (n), and lamellipodial area (0). Values in 

1, n, o are relative to untreated non-silenced control (dotted line). Scale 
bar, 251m (h). All data are mean + s.e.m., individual data points are 
shown for a and 1-0; values of n (independent experiments) are: 3 

(a, e, f, m, n), 4 (c (MSO), h, k, 1), 5 (0), 7 (j), 8 (c (GLUL*®), g), 13 (b). NS, 
P>0.05, *P=0.05, *P <0.05; one-way ANOVA with Dunnett’s multiple 
comparisons versus 4 mM (a), one-sample t-test (b, c, e, f, g, j, k), Student’s 
t-test (h, n, 0), paired Student’s t-test (1) or mixed-models R statistics (m). 
Exact P values are as follows: (Glu) 0.6 mM versus 4 mM: 0.9903; 0.025 mM 
+ MSO versus 4 mM: 0.0968; 0.025 mM versus 4 mM: 0.1943; (Gln) 

0.6 mM versus 4 mM: 0.4518; 0.025 mM + MSO versus 4 mM: 0.9999; 
0.025 mM versus 4 mM: 0.0143 (a); 0.0072 (b); MSO: 0.0323; GLUL®?: 
0.0095 (c); 0.053 (e); 0.1790 (£); 0.0035 (g); 0.0055 (h); ROCK1 MSO: 
0.0169; ROCK1 GLUL®: 0.0138; ROCK2 MSO: 0.0381; ROCK2 GLUL®?: 
0.0802; (k) MSO: 0.0283; GLUL*?: 0.0431; RHOJ*?: 0.0091 (j); 0.0431 

(1); GLUL*? versus control: <0.0001; GLUL*? + Y27632 versus control 

+ Y27632: 0.5211 (m); 0.0181 (n); 0.0210 (0). For gel source images, see 
Supplementary Fig. 1. 


Palmitoylation of GS and RHOJ 

Membrane localization often requires post-translational palmitoyla- 
tion. We therefore proposed that GS could be palmitoylated to enable 
plasma membrane localization and interaction with RHOJ. We per- 
formed click chemistry with biotin azide (Extended Data Fig. 7g) 
on lysates from HEK293 cells that overexpressed GS and had been 
treated with the clickable palmitoylation probes 16C-BYA or 16C-YA. 
Subsequent streptavidin pull-down showed that both probes labelled 
GS, which was a clear indication that GS had been palmitoylated. The 
labelling was reduced by MSO, which is consistent with the presumed 
dependency of the phenotype on the catalytic site of the enzyme 
(Fig. 5d). 

The palmitoylation of GS has previously been reported anecdotally, 
however in-depth molecular and functional characterization was not car- 
ried out””. To determine whether GS undergoes autopalmitoylation, we 
incubated purified GS?! with palmitoyl-alkyne coenzyme A (palmitoyl- 
alkyne CoA; a substrate for palmitoylation) in a cell-free system with- 
out any other proteins present, to demonstrate a direct effect. Click 
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Fig. 5 | GS (auto)-palmitoylation. a, Co-immunoprecipitation of 
endogenous RHOJ and GS in ECs. Top, immunoprecipitation of RHOJ; 
bottom, immunoprecipitation of GS. b, Immunoblot for GS and RHOJ in 
cytosolic (c) and membrane (m) fractions in ECs with NaK and GAPDH 
as fraction markers. c, Diffusion coefficient of single photoswitchable 
fluorescent protein mEOS and mEOS-fused GS (mEOS-GS) particles in 
the plasma-membrane region of ECs acquired by SPT-PALM under TIRF 
illumination (n= 41 cells expressing mEOS and 37 expressing mEOS-GS). 
d, GS immunoblotting after streptavidin pull-down of biotin-azide-clicked 
lysates from HEK-293T cells for the indicated palmitoylation probes. 
Input shows levels of GS overexpression. e, Effect of the concentration 

of palmitoyl-alkyne CoA on the autopalmitoylation of purified GS; 
biotin-azide clicking and HRP-streptavidin blotting are shown, with 


chemistry revealed that increasing the dose of palmitoyl-alkyne CoA 
resulted in increased autopalmitoylation of GS (Fig. 5e). Importantly, 
autopalmitoylation of GS was achieved with physiological concentra- 
tions of palmitoyl-alkyne CoA (1-10|1M) at neutral pH, which suggests 
physiologically relevant autopalmitoylation; this was subsequently con- 
firmed with two alternative methods (Supplementary Discussion 2 and 
Extended Data Fig. 7h-j). 

Palmitoylation of target proteins by palmitoyl-acyl transferases is a 
two-step reaction, requiring first autopalmitoylation of the palmitoyl- 
acyl transferase, followed by transfer of the palmitoyl group to the target 
protein. We considered that GS could have a similar activity profile 
(Supplementary Discussion 3) and explored whether it was involved 
in the palmitoylation of RHOJ. Even though the cysteines at positions 
3 (C3) and 11 (C11) of RHOJ were predicted by in silico methods to 
be high-fidelity palmitoylation sites (screened with SwissPalm”, data 
not shown), the palmitoylation of RHOJ has been poorly documented, 
with the exception of a few studies?>4, Notably, the membrane- 
localization and activity of RHOJ were reduced by treatment of ECs 
with the pan-palmitoylation inhibitor 2-bromopalmitate and by 
introducing point mutations in C3 and C11 (Fig. 5f, Extended Data 
Fig. 7k-t), providing initial evidence that RHOJ can be palmitoylated 
in ECs. Using the palmitoylation probe 17-ODYA (Fig. 5g) or an acyl- 
resin-assisted capture (Extended Data Fig. 7u), we found a reduction in 
the levels of palmitoylated RHOJ upon blocking GS, which is consistent 
with a model whereby GS sustains palmitoylation of RHOJ. 


Discussion 

We have found that GS is active in the regulation of EC motility. This 
activity is presumably independent of glutamine synthesis, although 
we cannot formally exclude a possible contribution of minimal levels 
of glutamine production by GS to the observed phenotype. GS was 
found to regulate RHOJ signalling in cell motility, as shown by several 
forms of evidence. First, a fraction of GS is present in EC membranes, 
which is where active RHO] resides. Second, GS interacts with RHOJ 
in ECs in co-immunoprecipitation experiments, although this inter- 
action could be direct or indirect. Third, GLUL knockdown reduces 
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the input shown on Coomassie-stained gel. f, Immunoblotting for 

RHOJ, NaK and GAPDH in membrane (m) and cytosolic (c) fractions 

of control- and 2-bromopalmitate (2BP)-treated ECs. g, Palmitoylation 
of RHOJ in GLULK?, MSO- and 2BP-treated ECs. In-gel fluorescence 

for tetramethylrhodamine (TAMRA)-azide 17-ODYA (palmitoylation 
probe)-clicked Flag-RHOJ is shown (Flag as loading control). 2BP is a 
pan-palmitoylation inhibitor. All data are mean + s.e.m., except box and 
whisker (running from minimal to maximal values) plots in (c), for which 
individual data points are shown; values of n (independent experiments) 
are: 2 (e), 3 (a, b,c, d, f), 4 (g). NS, P > 0.05, *P <0.05; Student's t-test (c); 
one-sample t-test (f, g). Exact P values are as follows: <0.0001 (c); 0.0264 
(f); MSO: 0.0317; GLUL®?: 0.0003; 2BP: 0.0163 (g). For gel source images, 
see Supplementary Fig. 1. 


the palmitoylation of RHOJ, its membrane localization and its activity 
in ECs. Therefore, because RHOJ promotes EC motility'*!’, the 
impaired migration of GLULK ECs could be attributed to a reduction 
in RHOJ activity. RHOJ probably also indirectly contributes to promot- 
ing EC motility by controlling the activity of the RHOA-ROCK-MLC 
signalling pathway, which is known to regulate EC motility by affecting 
stress-fibre formation!*!” (Extended Data Fig. 7v, Supplementary 
Discussion 4). 

Because purified GS seems to be capable of autopalmitoylation— 
a feature of palmitoyl-acyl transferase enzymes—and because GLUL 
silencing decreases the palmitoylation of RHOJ, our data support a 
model whereby GS first palmitoylates itself and then transfers the pal- 
mitoyl group to RHOJ, although we cannot formally exclude the pos- 
sibility that transfer of the palmitoyl group from GS to RHOJ occurs 
via additional partners or even non-enzymatically. A possible model 
for GS palmitoylation is described in Supplementary Discussion 5, 
Extended Data Fig. 8 and Extended Data Table 2. In addition, whether 
GS interacts exclusively with RHOJ or whether it can interact with other 
proteins (for example, other palmitoylated Rho GTPases such as RAC1, 
CDC42, RHOU or RHOV) to mediate this effect on EC motility is yet 
to be clarified. In any case, RHOJ seems to be a critical target of GS, 
given that its silencing completely phenocopies GS inhibition in ECs. 

Finally, GS is critical for EC motility and migration, which contributes 
to the formation of new vessels in development and disease. By contrast, 
ECs do not migrate when they are quiescent in healthy adults, which 
explains why GS inhibition has no observable effects on the vasculature 
in healthy adult mice. This renders GS an attractive disease-restricted 
target for the therapeutic inhibition of pathological angiogenesis. 
Furthermore, the pharmacological GS blocker MSO reduced patho- 
logical angiogenesis in blinding eye and psoriatic skin disease (Fig. 2), 
which warrants further exploration of GS targeting in anti-angiogenesis. 
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METHODS 


Chemicals and reagents. The GS inhibitor t-methionine sulfoximine (MSO), 
mitomycin C, latrunculin B, oligomycin, antimycin A, carbonyl cyanide-4- 
(trifluoromethoxy)phenylhydrazone (FCCP), 2-bromohexadecanoic acid 
(2-bromopalmitic acid, 2BP), tamoxifen, palmitoyl-CoA agarose and a-ketoglutarate 
dehydrogenase were from Sigma-Aldrich. 17-Octadecynoic acid (17-ODYA) and 
biotin-azide were purchased from Cayman Chemical. The use and/or synthesis 
of the other palmitoylation probes 15-hexadecynoic acid (16C-YA; a palmitate- 
based probe that binds a broader spectrum of proteins than does 16C-BYA 
(below), including both palmitoyl-acyl transferases and palmitoyl-acyl trans- 
ferase target proteins) and 2-bromooctadec-15-yonic acid (16C-BYA; a 2-bromo- 
palmitate-based activity-based probe that labels but also inhibits palmitoyl-acyl 
transferase enzymes) has previously been described”’. The ROCK kinase inhibitor 
Y27632 ((R)-(+)-trans-4-(1-aminoethyl)-N-(4-pyridyl)cyclohexanecarbox- 
amide) was from BioVision, fasudil hydrochloride and H1152 dihydrochloride 
were from Tocris. The myosin light-chain kinase inhibitors ML7-hydrochloride 
and peptide 18 were from Tocris. Collagen type 1 (rat tail) was obtained from 
Merck Millipore. [5-*H] glucose, [H]thymidine and [U-'4C] glutamine were from 
Perkin Elmer; [6-!4C]p-glucose was from ARC. [U-C]glucose, [U-!3C]glu- 
tamine, [U-'3C] glutamate and '"NH,Cl were purchased from Cambridge Isotope 
Laboratories. The following primary antibodies or dyes were used (dilutions for 
staining (ST), immunoblotting (IB), immunofluorescence (IF) and immunopre- 
cipitation (IP) are given between brackets): Griffonia simplicifolia (GS)-IB4-Alexa 
488 (ST 1:200), isolectin GS-IB4-Alexa 568 (ST 1:200), isolectin GS-IBy-Alexa 
647 (ST 1:200), phalloidin- Alexa 488 (ST 1:100), DNase I-Alexa 594 (ST 1:200) 
(Molecular Probes), anti-collagen IV (2150-1470) (IF 1:400) (BioRad), anti-NG2 
chondroitin sulfate proteoglycan (AB5320) (IF 1:200) (Millipore), anti-Flag (clone 
M2) (IB 1:1,000; IP 5 jg ml), anti-GS (clone 2B12) (IB 1:1,000; IP 2-5 1g ml“), 
anti-RHOJ (clone 1E4) (IB 1:1,000; IP 2-5 1g ml“), anti-ROCK1 (HPA007567) 
(IB 1:1,000), anti-a-tubulin (T6199) (IB 1:1,000) (Sigma-Aldrich), anti-6-actin 
(13E5) (IB 1:1,000), anti-phospho-myosin light-chain 2 (IB 1:1,000; IF 1:300) and 
anti-myosin light-chain 2 (IB 1:1,000) (9776), anti- Na,K-ATPase (NaK) (3010) (IB 
1:1,000), anti-RHOA (67B9) (IB 1:1,000) and anti-RHOC (D40E4) (IB 1:1,000) 
(Cell Signaling Technology), anti-CD105/endoglin (AF1320) (IF 1:50), anti-VE- 
cadherin (AF1002) (IF 1:50) (R&D Systems), anti-ROCK2 (A300-047A-T) (IB 
1:500) (Imtec Diagnostics), anti-CD31 (MEC13.3) (IF 1:200), anti-CD34-biotin 
(553732) (IF 1:25) (BD Biosciences), anti-RHOB (sc-180) (IB 1:1,000). Secondary 
Alexa-405, -488, -568 or -647-conjugated antibodies (1:500) were from Molecular 
Probes; other secondary antibodies and immunoglobulin-y (IgG) controls were 
from Dako. The Click-iT EdU Alexa Fluor 555 Imaging Kit was from Invitrogen. 
Purified bacterial GS was a gift from R. Levine (Bethesda). 

Cell culture. HUVECs and human umbilical artery ECs. The cells were obtained 
under protocol $57123 (Commission Medical Ethics of UZ/KU Leuven) after 
written consent of the donors, were isolated as previously described!” and were 
routinely cultured in M199 medium (Invitrogen) containing 20% FBS, 0.6 mM 
L-glutamine, heparin (10 U ml}; Sigma), penicillin (100 U ml"), streptomycin 
(100 1g ml~') and endothelial cell growth factor supplements (ECGS; 30 mg 171; 
Sigma). Cells were only used between passages 1 and 4 and all experiments were 
performed in HUVECs from at least three different donors unless stated otherwise. 
Except when stated otherwise, the use of the abbreviation EC in the text refers to 
HUVEC. 

Isolation of endothelial cells from human lung, liver or colon mucosa. Lung, liver 
or colon mucosa specimens were obtained under protocol $57123 (Commission 
Medical Ethics of UZ/KU Leuven) and were washed several times with phosphate 
buffer solution (PBS) and minced with scissors before enzymatic digestion for 45 min 
at 37°C with collagenase/dispase/DNase solution (Gibco, Life Technologies). The 
resulting suspension was passed through a 100-|1m nylon mesh (BD Biosciences 
Pharmingen) to remove aggregates. The collected cells were washed, seeded on 
gelatin pre-coated 6-well plates and cultured in complete endothelial growth 
medium (EGM-MYV; Lonza) supplemented with antibiotics. After 5-7 days, when 
cells reached confluency, a positive CD31 magnetic-bead selection was performed 
(CD31 MicroBead, 130-091-935, Miltenyi Biotech) according to the manufacturer's 
guidelines, and purified cells were further cultured in EGM medium. 

Blood outgrowth ECs. Cells were established and cultured as previously described”®. 
In brief, blood samples (obtained under protocol 57123 (Commission Medical 
Ethics of UZ/KU Leuven)) were diluted with PBS before Ficoll PaquePLUS (GE 
Healthcare) density-gradient centrifugation at 1,000g for 20 min at room tempera- 
ture. The mononuclear cell layer was collected, washed with PBS and resuspended 
in EGM2 medium (PromoCell). Cells were plated in collagen-coated flasks and the 
medium was replaced every 2 days. From day 7 onwards, cells were checked for the 
formation of colonies, which were allowed to grow to approximately 1 cm. Blood 
outgrowth EC colonies were then trypsinized and subcultured. 

HEK293T and HEPG2 cells. Cells (obtained from the ATCC) were grown in 
DMEM, supplemented with 10% FBS, 100 U ml“! penicillin and 100,.g ml“! 


streptomycin. When HEPG2? cells were compared directly to ECs in short-term stable 
isotope tracing experiments, they were incubated in exactly the same medium as 
the ECs to rule out possible bias arising from the difference in medium formula- 
tion. We did not perform authentication of the HEK293T and HEPG2 cells. 
Mouse liver ECs. Cells were isolated from perfused healthy livers of control or 
GluEC®° mice. Before perfusion, the mice were anesthetized with Nembutal 
(60 mg kg~!). Mice were perfused with 5 ml of a water-based perfusion buffer 
containing 1.7 M NaCl, 84 mM KCl, 120 mM HEPES and 1 mM NaOH followed 
by 5 ml of a PBS-based digestion buffer containing 0.1% collagenase II (Life 
Technologies), collagenase I (Life Technologies), 2 mM CaCl, 1% antibiotic- 
antimycotic (Life Technologies) and 10% FBS (Biochrom) at a perfusion rate of 
1 ml min“!. Perfusion was considered complete when the liver and mesenteric 
vessels were blanched and the desired amount of digestion buffer (>5 ml) had 
passed through the circulatory system. Livers were dissected, placed into a 50 ml 
conical tube with 3 ml of digestion buffer and incubated at 37 °C for approximately 
30 min, with regular shaking of the tubes every 5 min. After digestion, the tissue 
was homogeneously dissociated and the reaction was stopped with 10 ml of iso- 
lation buffer containing PBS + 0.1% BSA (Sigma-Aldrich). Subsequently, the cell 
suspension was filtered through a 100-1m cell strainer and cells were washed twice 
with isolation buffer. Finally, the ECs were isolated by magnetic-bead sorting with 
Dynabeads (CELLection Biotin Binder Kit, Life Technologies) coated with anti- 
mouse CD31 (eBioscience, Anti-Mouse CD31 Clone 390), according to the man- 
ufacturer’s instructions. In brief, the cell suspension was incubated with the beads 
at room temperature for 30 min in HulaMixer Sample Mixer (Life Technologies). 
Next, CD31* ECs were collected by putting the tubes on a DynaMag-50 Magnet 
(Life Technologies) and removing the supernatant. The procedure was repeated 
twice to remove cell debris. Finally, cells were resuspended in EGM2 medium 
(PromoCell) and plated at the desired density on cell-culture plates pre-coated 
with 0.1% gelatin, and grown to confluency. 

Mouse astrocytes. The cells were prepared as previously described with minor 
changes”. In brief, spinal cords were dissected from 13-day-old C57BL/6] mouse 
embryos. Meninges and dorsal root ganglia were removed and a single cell pop- 
ulation was obtained by digestion with 0.05% trypsin in combination with gentle 
trituration. The cell suspension was layered on a 6.2% OptiPrep (Axis-Shield) 
cushion and centrifuged at 500g for 15 min. The pellet was resuspended and the 
cells were plated (12,000 cells per cm?) in L15 medium supplemented with glucose 
(3.6 mg ml~!), sodium bicarbonate (0.2%), penicillin (100 IU ml7}), streptomycin 
(100,.g ml~') and FBS (10%). After reaching confluency, cell division was halted 
by treatment with cytosine arabinoside (101M, 3 days). After 4 weeks, more than 
95% of cells stained positive for glial fibrillary acidic protein (not shown). We 
routinely tested primary cells and cell lines for mycoplasma contamination with 
the MycoAlert mycoplasma detection kit (Lonza, LT07-418). 

Plasmid constructions and lentiviral particle production. cDNA for human 
GLUL was obtained from Origene. Silent mutations were introduced to make the 
GLUL cDNA resistant to the GLUL-specific shRNA (see below, TRCN0000045628). 
Constructs with point mutations were generated with Stratagene’s QuickChange 
site-directed-mutagenesis kit following the manufacturer’s guidelines. The cDNA 
for RHOJ-eGFP (GFP-TCL) was a gift from C. Der (Addgene plasmid 23231)”? and 
was used as a template to generate the N-terminal-truncated AN20-RHOJ-eGFP, 
which lacked the first 20 amino acids, and Flag-tagged RHOJ. Standard cloning 
techniques were used to fuse GS to the photoswitchable fluorescent protein mEOS 
(pRSETa-mEos2 was a gift from L. Looger; Addgene plasmid 20341)**. The bimo- 
lecular fluorescence complementation (BiFC) approach vector enabling simultane- 
ous expression of two separate cDNAs fused to eGFP subfragment 1 (N-terminal; 
containing amino acids 1 to 158) or subfragment 2 (C-terminal; containing amino 
acids from 159 onwards), respectively, was a gift from H. Mizuno (KU Leuven). 
GS was fused to the N-terminal subfragment of eGFP and RHOJ was fused to 
the C-terminal subfragment of eGFP to generate GS-eGFP"/?, RHOJ-eGFP?”. 
Lentiviral expression constructs were obtained by cloning the respective cDNAs 
into pRRLsinPPT.CMV.MCS MM WPRE-vector. Validated GLUL-specific (TRC 
clones TRCN0000045628 (used in the majority of the experiments and indicated as 
GLUL*®! in Extended Data Fig. 2a) and TRCN0000045631 (indicated as GLULK”? 
in Extended Data Fig. 2a and only used to confirm the migration and lamellipodial 
defect in Extended Data Fig. 3a, b) and RHOJ-specific (TRCN0000047606) shR- 
NAs were either used in the pLKO.1 vector or subcloned into the pLlVX-shRNA2 
vector (PT4052-5; Clontech, Westburg BV). Scrambled shRNAs or the empty vec- 
tors were used as negative controls (both with the same outcome). All constructs 
were sequence-verified. Lentiviral particles were produced in HEK293T cells as 
previously described’. 

Recombinant protein production. Template vectors pRRLhGS, pRRLhG 
and pRRLhGS®#!¢ containing the genes encoding wild-type GLUL or GLUL with 
point mutations were used as templates for PCR-based cloning. Recombinant con- 
structs were expressed in the Escherichia coli strain BL21 codon + pICA2 that 
was transformed with pLH36-hGS in which expression is induced by isopropyl 
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8-p-1-thiogalactopyranoside under control of a pL-promotor developed by the 
Protein Core of VIB?**°. The pLH36 plasmid is provided with a His6-tag followed 
by a murine caspase-3 site. The murine caspase-3 site can be used for the removal 
of the His6-tag attached at the N terminus of the protein of interest during puri- 
fication. The transformed bacteria were grown in 200 ml Luria Bertani medium 
supplemented with ampicillin (100;1g ml!) and kanamycin (501g ml!) overnight 
at 28°C before 1/100 inoculation in a 20 | fermenter provided with Luria Bertani 
medium supplemented with ampicillin (100 1g ml~') and 1% glycerol. The initial 
stirring and airflow was 200 r.p.m. and 1.51 min“1, respectively. Further, this 
was automatically adapted to keep the pO, at 30%. The temperature was kept 
at 28°C. The cells were grown to an optical density at 600 nm (OD¢00 nm) of 1.0, 
transferred at 20°C, and expression was induced by addition of 1 mM isopropyl 
8-D-1-thiogalactopyranoside overnight. Cells were then collected and frozen at 
—20°C. After thawing, the cells were resuspended at 3 ml g~! in 50 mM HEPES 
pH 7.5, 500 mM NaCl, 20mM imidazole, 1 mM phenylmethylsulfonyl fluoride, 10% 
glycerol, 5 mM 6-mercaptoethanol, 1 mg per 100 ml DNasel (Roche) and 1 tablet 
per 100 ml Complete Protease Inhibitor (Roche). The cytoplasmic fraction was 
prepared using the Emulsiflex C3 (Avestin) followed by centrifugation. All steps 
were conducted at 4°C. The clear supernatant was applied to a 20 ml Ni-Sepharose 
6 FF column (GE Healthcare), equilibrated with 50 mM HEPES pH 7.5, 500 mM 
NaCl, 20 mM imidazole, 10% glycerol, 5 mM B-mercaptoethanol and 1 mM 
phenylmethylsulfony! fluoride. The column was eluted with 50 mM HEPES pH 7.5, 
500 mM NaCl, 400 mM imidazole, 10% glycerol, 5 mM 6-mercaptoethanol and 
1 mM phenylmethylsulfony] fluoride after an intermediate elution step with 50 mM 
imidazole in the same buffer. Finally, the elution fraction was injected on a HiLoad 
26/60 Superdex prep grade column with 20 mM HEPES pH 7.5, 300 mM NaCl, 
10% glycerol and 0.5 mM tris(2-carboxyethyl)phosphine as running solution. The 
obtained elution fractions were analysed by SDS-PAGE. Recombinant protein 
concentration was determined using the Micro-BCA assay (Pierce). 

In vitro knockdown and overexpression strategies. To minimize off-target 
effects and other silencing artefacts, key findings were confirmed with at 
least two independent and validated GLUL-specific shRNAs (see above) 
and appropriate controls or with a GLUL-specific siRNA duplex (5/-GGAAUAG 
CAUGUCACUAAAGCAGGC-3’) and scrambled control (TriFECTa, IDT). For 
lentiviral transduction of shRNAs or overexpressing constructs a multiplicity of 
infection (MOI) of 10 or 5 was used, respectively. In the case of simultaneous 
transduction of two different shRNAs, a MOI of 7.5 was used for each individual 
shRNA. In the case of simultaneous transduction of a shRNA in combination with 
an overexpression construct, the shRNA was transduced at a MOI of 10 and the 
overexpression construct at a MOI of 5, except for overexpression constructs for 
shRNA-resistant GLUL which were transduced at a MOI of 2.5. Transductions were 
performed on day 0 in the evening, cells were refed with fresh medium on day 1 
in the morning and experiments were performed from day 3 or 4 onwards. siRNA 
transfection mixtures (in a total volume of 50011) were prepared in Opti- MEM 
containing GlutaMAX-I (Invitrogen) with Lipofectamine RNAi Max transfection 
reagent (Invitrogen) according to the manufacturer's instructions. The mixtures 
were added to the cells (150,000 cells in a 6-well-format plate) together with 2 ml 
EBM2 without antibiotics for overnight transfection, after which the medium was 
changed back to the regular M199 culture medium. siRNA transfection was per- 
formed at least 48 h before functional assays. BiFC plasmids were transfected into 
HEK293T cells with Fugene HD transfection reagent following the manufacturer's 
guidelines. Knockdown efficiency and overexpression levels were closely moni- 
tored for each experiment either on the mRNA (by PCR with reverse transcription; 
RT-PCR) or the protein level. 

RNA isolation and gene expression analysis. Total RNA was extracted with 
PureLink RNA mini kit (Invitrogen) according to the manufacturer's instructions; 
quality and quantity were measured on a Nanodrop (Thermo Scientific). CDNA 
synthesis was performed with the iScript cDNA synthesis kit (BioRad). qPCR 
analyses were performed as previously described! on an Applied Biosystems 7500 
Fast device with in-house-designed primers and probes or premade primer sets 
(Applied Biosystems or Integrated DNA Technologies) for which sequences and/or 
primer set ID numbers are available upon request. ENOX2 or HPRT1 were used 
as housekeeping genes. 

Western blotting and (co-)immunoprecipitation. Proteins were extracted in 
Laemmli buffer (125 mM Tris-HCl (pH 6.8), 2% SDS, 10% glycerol) or in RIPA 
buffer (25 mM Tris-HCl (pH 7.6), 150 mM NaCl, 1% NP-40, 1% sodium deoxy- 
cholate, 0.1% SDS) containing protease and phosphatase inhibitor mixes (Roche 
Applied Science). After shearing of genomic DNA, proteins in the lysates were 
separated by SDS-PAGE, transferred to nitrocellulose or polyvinylidene difluoride 
membranes and detected with specific antibodies and HRP-conjugated second- 
ary antibodies in combination with enhanced chemiluminescence or SuperSignal 
Femto western blotting substrate (Thermo Scientific). Signal was acquired with 
Image Quant LAS 4000 V 1.2 and densitometric quantification was carried out 
with Image]. For MLC and pMLC immunoblotting, each sample was loaded on two 
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separate gels. One gel was used to detect MLC and the second was used to detect 
pMLC. Both gels had their own loading control, namely a-tubulin. pMLC/MLC 
was quantified as follows: (pMLC/a-tubulin)/(MLC/a-tubulin), abbreviated in the 
figure panel as (c)pMLC/(c)MLC with (c) meaning ‘corrected for corresponding 
loading control. Membrane versus cytosolic protein fractions were purified with 
the Plasma Membrane Protein Extraction Kit (101Bio) according to the manu- 
facturer’s guidelines and using proprietary buffers. For co-immunoprecipitation 
(co-IP) of endogenous or overexpressed proteins, ECs were lysed by rotating at 
4°C for at least 4 h in co-IP lysis buffer (20 mM Tris-HCl pH8, 137 mM NaCl, 
10% glycerol, 1% nonidet NP-40 and 2 mM EDTA). Equal amounts of protein 
were incubated overnight with specific antibodies or matching isotype control 
IgGs at 4°C. Subsequently, 20 11 of protein A/G-sepharose beads was added to the 
immune complexes for 4 h at 4°C under gentle rotation. The beads were pelleted, 
washed three times with ice-cold co-IP lysis buffer and boiled for 5 min in reducing 
agent and loading buffer before SDS-PAGE. To determine the effect of deleting the 
first 20 N-terminal amino acids of RHOJ on its interaction with GS, co-IPs were 
carried out as above on ECs simultaneously overexpressing GS and RHOJ-eGFP 
or AN20-RHOJ-eGFP. In some of the experiments, the expression of the AN20- 
RHOJ-eGFP was lower than the expression of RHOJ-eGFP. To correct for this 
possible bias, densitometric quantification of all bands was performed in ImageJ 
and signals in the immunoprecipitation lanes were normalized to the input signals. 
The amount of GS immunoprecipitated was the same under the RHOJ-eGFP and 
AN20-RHOJ-eGFP conditions (data not shown). 

Biochemical and metabolic assays. Bicinchoninic acid assay. The BCA protein 
assay kit (Pierce) was used to determine protein content with Gen5 1.11.5 (BioTek 
Instruments). 

Lactate dehydrogenase release. This was determined as a measure of cell survival 
using the Cytotoxicity Detection Kit (Roche Applied Science) with Gen5 1.11.5 
(BioTek Instruments). 

Intracellular reactive oxygen species. Levels were determined by CM-H,DCFDA dye 
(5-(and-6)-chloromethyl-2’,7’-dichlorodihydrofluorescein diacetate, acetyl ester; 
Invitrogen) labelling following the manufacturer's guidelines. 

Glutamine synthetase activity. The enzyme activity in living cells was determined by 
pulse-labelling the cells for 30 min with 2 mM !SNH,Cland subsequent determina- 
tion of 5N incorporation in intracellular glutamine by gas chromatography—mass 
spectrometry (GC-MS). Similarly, GS activity was measured by pulse-labelling for 
30 min with 0.5 mM [U-4C]glutamic acid and subsequent tracing of °C into glu- 
tamine by GC-MS. The 0.025 mM glutamine condition was added to this assay for 
the sole purpose of having a positive control—lowering external glutamine levels 
should increase GS activity—and does not in any way reflect maximal GS activity. 
Background signals were determined by pre-incubating the cells with the GS inhib- 
itor MSO. As an independent method (not relying on labelling one of the imme- 
diate substrates (NH,* or glutamate)) of determining GS activity, we performed 
steady-state labelling of ECs with [U-'’C]glucose (5.5 mM) and determined the 
8C contribution to a-ketoglutarate, glutamate and glutamine (for labelling scheme, 
see Extended Data Fig. 5f). Before derivatization for GC-MS analysis, cells were 
washed with ice-cold 0.9% NaC] and extracted in ice-cold 80/20 methanol/water. 
Glutamine uptake assay. Dynamic [U-!C] glutamine uptake assays were performed 
as follows: 2.5 x 10° cells per well were seeded in 6-well plates and pulse-labelled 
for 0, 0.5, 10, 20 and 30 min with the regular M199 culture medium contain- 
ing 0.6 mM [U-'4C] glutamine instead of the regular 0.6 mM unlabelled glu- 
tamine. The 0-min time point represents an absolute negative control for which 
extracts were made from ECs that had never been treated with tracer-containing 
medium. For the 0.5-min time point, the labelled medium was put on the cells and 
immediately aspirated (altogether taking 0.5 min). At all time points, cells were 
thoroughly washed twice with ice-cold 0.9% NaCl to ensure complete removal 
of tracer-containing medium. Cellular extracts were then made in ice-cold 
80/20 methanol/water, before derivatization for GC-MS measurements. 
Alternatively, cells were incubated with 0.5 j1Ci ml! [U-"C]-1-glutamine for 10 min, 
after which they were washed at least three times with ice-cold PBS. The last PBS 
wash was collected and checked for residual radioactivity. Cells were then lysed 
with 200,11 0.2 M NaOH and lysates were neutralized with 20 jl 1 M HCl and used 
for scintillation counting. 

PH] thymidine incorporation. Proliferation was determined by labelling the cells with 
1,.Ci mI“ [H]thymidine for 2 h, followed by fixation in 100% ethanol for 15 min, 
precipitation with 10% trichloroacetic acid and finally lysis in 0.1 M NaOH. 
Scintillation counting was used to assess the amount of [H]thymidine incorpo- 
rated into the DNA. 

Energy charge assessment. Cells (1.5 x 10°) were collected in 10011 ice-cold 0.4 M 
perchloric acid containing 0.5 mM EDTA. The pH was adjusted with 100 1 of 2M 
K,COs3. One hundred microlitres of the mixture was subsequently injected onto an 
Agilent 1260 HPLC with a C18-Symmetry column (150 x 4.6 mm, 5 mm; Waters), 
maintained at 22.5°C. The flow rate was kept constant at 1 ml min“. A linear gra- 
dient using solvent A (50 mM NaH>PO,, 4 mM tetrabutylammonium, adjusted to 
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pH 5.0 with H2SO,) and solvent B (50 mM NaH»PO,, 4 mM tetrabutylammonium, 
30% CH3CN, adjusted to pH 5.0 with H,SO,) was accomplished as follows: 95% A 
for 2 min, from 2 to 25 min linear increase to 100% B, from 25 to 27 min isocratic 
at 100% B, from 27 to 29 min linear gradient to 95% A and finally from 29 to 
35 min at 95% A. ATP, ADP and AMP were detected at 259 nm. 

Seahorse extracellular flux measurements. ECs were seeded at 1.5 x 10° cells per 
well on Seahorse XF24 tissue culture plates (Seahorse Bioscience Europe). Oxygen 
consumption (OCR) measurements were performed at 6-min intervals (2 min 
mixing, 2 min recovery, 2 min measuring) in a Seahorse XF24 device (XF Reader 
1.8.1.1 software). Consecutive treatments with oligomycin (1.2 1M final), FCCP 
(5M final) and antimycin A (11M final) were performed to enable quantifica- 
tion of ATP-coupled OCR (OCRarp) and maximal respiration, next to basal OCR 
(OCRygas)- 

Glycolytic flux. ECs were cultured for 6 h in medium containing 0.4 Ci ml“! 
[5--H]p-glucose (Perkin Elmer) after which the supernatant was transferred 
into glass vials sealed with rubber stoppers. *H2O was captured in hanging wells 
containing a Whatman paper soaked with H2O over a period of 48 h at 37°C 
to reach saturation’. The paper was then used for liquid scintillation counting 
(QuantaSmart V4 Perkin Elmer). 

[4C] glucose oxidation. ECs were incubated for 6 h in medium containing 0.55 tCiml 
[6-'4C]p-glucose. After that, 250.1 of 2 M perchloric acid was added to each well 
to stop cellular metabolism and to release CO», which was captured overnight at 
room temperature in 1X hyamine hydroxide-saturated Whatman paper. The radio- 
activity in the paper was determined by liquid scintillation counting (QuantaSmart 
V4 Perkin Elmer)!. 

[4C] glutamine oxidation. ECs were incubated for 6 h with medium containing 
0.5 .Ci ml“! [U-'4C] glutamine. Two hundred and fifty millilitres of 2 M perchloric 
acid was added to the cells to stop cellular metabolism and release 4CO,. Trapping 
of '4CO, occurred as described for '4C-glucose oxidation’. 

Protein (auto)palmitoylation detection. In vitro palmitoylation (click-reaction- 
based). Purified bacterial GS protein was incubated with the indicated concentra- 
tion of palmitoyl alkyne-CoA (Cayman Chemical) for 6 h at room temperature. 
The GS protein was then denatured by the addition of SDS. A click reaction with 
biotin azide was performed to label the palmitoylated proteins”>. Palmitoylated 
proteins were detected by SDS-PAGE followed by blotting with streptavidin- 
horseradish peroxidase. 

Fluorescence-based CoA release detection. During autopalmitoylation of proteins, 
palmitate is transferred from palmitoyl-CoA to the protein, thereby releasing 
reduced CoA. a-Ketoglutarate dehydrogenase can use CoA to convert a-ketogluta- 
rate to succinyl-CoA, a reaction that features the reduction of NAD* to fluorescent 
NADH?". In brief, recombinant human GS was incubated with palmitoyl-CoA 
in MES buffer at physiological pH for at least 1 h at 30°C. The volume was then 
adjusted to 20011 in 50 mM sodium phosphate buffer (pH 6.8) containing 2 mM 
a-ketoglutaric acid, 0.25 mM NAD*, 0.2 mM thiamine pyrophosphate, 1 mM 
EDTA, 1 mM dithiothreitol and 32 mU a-ketoglutarate dehydrogenase. NADH 
levels were measured at 20 min after initiation of the reaction on a VICTOR plate 
reader (340 nm excitation, 465 nm emission). The experiment was performed in 
two directions: either with varying doses of palmitoyl-CoA for a fixed amount of 
recombinant GS (2\.g) or with varying amounts of recombinant GS for a fixed 
concentration of palmitoyl-CoA (401M). 

Affinity chromatography. A previously published protocol was used to determine 
cell-free binding of recombinant human GS to palmitoyl-CoA agarose. A total 
of 501 of immobilized palmitoyl-CoA-agarose was equilibrated with 20 mM 
Tris-HCl (pH 8.4)/120 mM NaCl buffer. The beads were incubated with 40 j1g of 
recombinant human GS in a final volume of 200 il for 2 h at room temperature on 
a rotatory system. Beads were pelleted and 201] of the supernatant was collected 
as the flow-through fraction. Beads were then washed eight times with 500 1l of 
20 mM Tris-HCl (pH 8.4)/120 mM NaCl buffer. Twenty microlitres of the last wash 
fraction was collected as fraction wash-8. Beads were then eluted with SDS loading 
buffer and heated for 15 min at 60°C. Two micrograms of recombinant protein 
was used as input fraction. Input fraction, flow-through, wash-8 and SDS-eluate 
were analysed by immunoblotting for the presence of GS. 

In-cell labelling. In-cell labelling experiments were performed essentially as previ- 
ously described”>. HEK-293T cells were transfected with the indicated expression 
plasmids. Twenty-four hours after transfection, the medium was replaced with 
DMEM + 10% dialysed FBS containing the indicated probes (501M 16C-YA or 
50,M 16C-BYA). After 18 h, cell lysates were collected by incubation of the cells 
on ice for 15 min in lysis buffer (50 mM TEA-HCl (pH =7.4), 150 mM NaCl, 1% 
Triton X-100, 0.5% sodium deoxycholate, 0.1% SDS and 5 mM phenylmethanesul- 
fonyl fluoride) followed by centrifugation for 10 min at 15,000g. Equal amounts 
of protein were then used for a click reaction with biotin azide. For labelling 
with 17-ODYA, Flag~RHOJ overexpressing ECs were incubated overnight with 
17-ODYA (50|1M) in M199 supplemented with 3.6% fatty-acid-free BSA, 10% 
dialysed FBS and 5 mM sodium pyruvate. Cells were washed with ice-cold PBS 


and lysed in NaP lysis buffer (0.2 M NayHPO4-2H20, 0.2 M NaH>PO,-2H,0, 1 M 
NaCl, 10% NP40). Two micrograms of anti-Flag antibody was conjugated to 2011 
of Dynabeads protein G (Thermo Fisher) for 1 h at room temperature. After wash- 
ing the beads twice with NaP lysis buffer, at least 500 1g of protein was added to the 
beads for 3 h at 4°C. Then beads were washed three times with NaP lysis buffer 
and resuspended in 201 of resuspension buffer (4% SDS, 50 mM TEA, 150 mM 
NaC]). The click reaction was initiated by adding 0.511 of 5 mM TAMRA-azide 
(Lumiprobe), 0.5 11 50 mM tris(2-carboxyethyl)phosphine hydrochloride (TCEP- 
HC), 0.511 10 mM tris[(1-benzyl-1H-1,2,3-triazol-4-yl)methyl]amine (TBTA) and 
2.411 of 5 mM freshly prepared ascorbic acid. Samples were then incubated for 1 h 
at 37°C in the dark. Sample buffer (9.411) and reducing agent (3.7 jl) were added 
to stop the reaction. After 10 min at room temperature in the dark, samples were 
frozen at —80°C or run on a 10% Bis-Tris gel in MES buffer. In-gel fluorescence 
was imaged with Typhoon FLA 9500 V1.0. 

Streptavidin pull-down. After click reaction with biotin azide, free biotin azide 
was removed from the samples by centrifugal filtration column (Millipore). The 
samples were then incubated with streptavidin-conjugated beads for 1 h at room 
temperature. After washing with PBS-T, proteins were eluted from the beads by 
incubation in elution buffer (95% formamide, 10 mM EDTA (pH =8.0)) at 95°C 
for 5 min. 

Acyl-resin-assisted capture. This method, in which free cysteine thiols are chemi- 
cally blocked and palmitoylated cysteines are exposed and captured by a resin, was 
performed with the CAPTUREome S-Palmitoylated Protein Kit (Badrilla) with 
minor adaptations to the manufacturer’s guidelines. Five hundred micrograms 
of protein were incubated for 4 h in 500 11 of thiol blocking reagent (to block free 
thiols). Proteins were precipitated with ice-cold acetone and afterwards solubilized 
with 300 il of binding buffer and spun down. After protein quantification, 301g 
was kept as total input fraction (IF), and equal amounts of protein were incubated 
for 2.5 h with or without (to obtain the negative control preserved bound fraction 
pBF) a thioester linkage specific cleavage reagent to cleave the thioester bond. 
Newly liberated thiols were captured with CAPTUREome resin. The resin was 
spun down and the supernatant was collected as the cleaved unbound fraction 
(cUF) to check whether the proteins of interest were indeed completely depleted 
from the thioester cleavage reagent (meaning efficient capture of the free thiols by 
the resin). After thorough washing of the resin, captured proteins (cleaved bound 
fraction, cBF), were eluted with reductant and analysed together with the input 
fraction, cleaved unbound fraction and preserved bound fraction by SDS-PAGE 
followed by immunoblotting. 

GC-MS analysis. Metabolites from cells were extracted in 80011 80% methanol 
(at —80°C). Next the extracts were centrifuged at 4°C for 15 min at 20,000g and 
the supernatants were dried in a vacuum centrifuge. Twenty-five microlitres of a 
2% methoxyamine hydrochloride solution (20 mg dissolved in 1 ml pyridine) was 
added to the dried fractions, which were then incubated at 37 °C for 90 min. Then 
75 ul of N-tert-butyldimethylsilyl-N-methyltrifluoroacetamide with 1% N-tert- 
butyldimethylchlorosilane (Sigma-Aldrich) was added and the reaction was carried 
out for 30 min at 60°C. Reaction mixtures were centrifuged for 15 min at 20,000g at 
4°C to remove insolubilities and the supernatant was transferred to a glass vial with 
conical insert (Agilent). GC-MS analyses were performed on an Agilent 7890A 
GC equipped with a HP-5 ms 5% Phenyl Methyl Silox (30 m, 0.25 mm internal 
diameter, 0.25 1m; Agilent Technologies) capillary column, interfaced with a triple 
quadrupole tandem mass spectrometer (Agilent 7000B, Agilent Technologies) 
operating under ionization by electron impact at 70 eV. The injection port, inter- 
face and ion source temperatures were kept at 230°C. The temperature of the 
quadrupoles was kept at 150°C. The injection volume was 111, and samples were 
injected at a 1:10 split ratio. The helium flow was kept constant at 1 ml min~!. The 
temperature of the column started at 100°C for 5 min and increased to 260°C at 
2°C min“. Next, a 40°C min“! gradient was applied until the temperature reached 
300°C. After the gradient, the column was heated for another 3 min at 325°C. The 
GC-MS analyses were performed in single-ion-monitoring scanning mode for the 
isotopic pattern of metabolites. 

LC-MS analysis. Polar metabolites were extracted using 250 1l of a 50/30/20 
methanol/acetonitrile/10 mM ammonium acetate pH 9.3 (containing 21M of 
deuterated (d27) myristic acid as internal standard) extraction buffer. Following 
extraction, precipitated proteins and insoluble matter were removed by centrifu- 
gation at 20,000¢ for 20 min at 4°C. The supernatant was transferred to the appro- 
priate mass spectrometry vials. Measurements were performed using a Dionex 
UltiMate 3000 LC System (Thermo Scientific) in-line connected to a Q-Exactive 
Orbitrap mass spectrometer (Thermo Scientific). Fifteen microlitres of sample was 
injected and loaded onto a Hilicon iHILIC-Fusion(P) column (Achrom). A linear 
gradient was applied starting with 90% solvent A (LC-MS grade acetonitrile) and 
10% solvent B (10 mM ammonium acetate pH 9.3). From 2 to 20 min the gradient 
changed to 80% B and was kept at 80% until 23 min. Next a decrease to 40% B 
was applied to 25 min, further decreasing to 10% B at 27 min. Finally, 10% B was 
maintained until 35 min. The solvent was used at a flow rate of 200,11 min™!, the 
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column temperature was kept constant at 25°C. The mass spectrometer operated 
in negative-ion mode, and the settings of the HESI probe were as follows: sheath 
gas flow rate at 35, auxiliary gas flow rate at 10 (at a temperature of 260°C). Spray 
voltage was set at 4.8 kV, temperature of the capillary at 300°C and S-lens RF 
level at 50. A full scan (resolution of 140,000 and scan range of m/z 70-1,050) 
was applied. For the data analysis, we used an in-house library and metabolites of 
interest were quantified (area under the curve) using the XCalibur 4.0 (Thermo 
Scientific) software platform. 

In vitro assays. Endothelial spheroid capillary sprouting. The assay was performed 
following established protocols’. To form the spheroids, ECs were cultured over- 
night in hanging drops in EGM2 medium with methylcellulose (Sigma-Aldrich; 
20% volume of a 1.2% solution of methylcellulose viscosity 4,000 cP). Spheroid 
sprouting involves both EC proliferation and migration. To have a ‘clean’ view on 
the migration aspect in sprouting, we also included conditions in which we blocked 
EC proliferation before sprout formation. More precisely, mitotic inactivation was 
achieved by adding mitomycin C (1,.g ml“) to the medium. To induce sprouting, 
spheroids were embedded in a collagen gel and incubated for 20 h. If required, 
chemical compounds (Fasudil at 101M, H1152 at 1j1M and Y26732 at 10,1.M) 
were added during the collagen-gel-incubation step. Spheroids were then fixed 
with 4% paraformaldehyde and imaged under phase-contrast illumination with a 
Motic AE 31 microscope (Motic Electric Group) or a Leica DMI6000B microscope 
(Leica Microsystems). Phase-contrast images were used to quantify the number of 
sprouts per spheroid and the total sprout length (cumulative length of all sprouts on 
a spheroid). Spheroid body circumference was measured to correct for differences 
in sizes of the spheroid. Per experiment (that is, per individual HUVEC isolation) 
at least 10 spheroids per condition were analysed. 

Scratch wound assays. Seventy-five thousand HUVECs were seeded in a 24-well 
format and were allowed to reach confluency over the next 24h. At time Tp the con- 
fluent monolayer was scratched with a 200-1] pipette tip and photographed. The 
cells were further incubated for the indicated times and photographed again at time 
point T,. The gap area at Ty minus the gap area at T, was measured with Image] 
and expressed as % migration distance. Per well, three non-overlapping regions 
along the scratch were analysed. Much like the spheroid sprouting, scratch wound 
healing is a combined readout for EC migration and proliferation. Therefore, we 
also included conditions in which the ECs were pre-treated with mitomycin C 
(1pg ml~!) to rule out the effect of proliferation. 

Boyden chamber assays. Fifty thousand HUVECs were seeded on 0.1% gelatin- 
coated transwells and allowed to adhere. Then, the transwells were washed and 
refed with medium containing only 0.1% FBS and placed in bottom wells containing 
medium with 5% FBS as a pro-migratory stimulus. Sixteen hours later, transwells 
were processed and analysed for numbers of migrated cells. Pre-treatment with 
mitomycin C (see above) was applied. 

Velocity of random movement. This was assessed on HUVECs that were sparsely 
seeded on glass-bottom 24-well plates. Time-lapse videos were generated by con- 
focal image acquisition at 4-min intervals. Velocity of movement was determined 
by tracking the nucleus position as a function of time (jum h7!) (Tracking Tool, 
Gradientech AB). Per condition, on average 2 or 3 individual cells were traced in 
each biological repeat. 

Lamellipodial area. This was measured on sparsely seeded phalloidin-stained 
ECs with Leica MM AF morphometric analysis software (Leica Microsystems) 
with in-house-developed macros and is expressed in percentage of total cell area. 
Treatment with MSO (1 mM), Y27632 (101M), fasudil (101M), H1152 (141M), 
ML7 (154M) and peptide 18 (151M) was carried out 24 h before analysis of 
the cells. Per experimental condition, a minimum of ten individual cells was 
analysed. 

Staining and quantification of VE-cadherin junctions. VE-cadherin staining and 
quantification of junctional length and gap index was performed as previously 
described**. First, the total junctional length (100%) was determined by summing 
up all segments, then the sum of all continuous segments was calculated as the 
percentage of total junctional length. The percentage difference between total and 
continuous represents the discontinuous length. Gap size index (intercellular gap 
area/cell number) was determined with the formula ([intercellular gap area/total 
cell area] x 1,000)/cell number. Junctional lengths, intercellular gap area and total 
cell area were defined manually with ImageJ. For each condition, a minimum of 
10 fields was quantified (10-15 cells per field on average) per experiment, and data 
shown represent the mean of at least three independent experiments. 
Transendothelial electrical resistance. Fifty thousand ECs were seeded on 6.5-mm 
0.1% gelatin-coated polyester transwells, 0.4-j1m pore size (Costar ref. 3470, Sigma- 
Aldrich). The electrical resistance was measured with an Endhome-6 electrode 
(World Precision Instruments) connected to an EVOM2 voltohmmeter (World 
Precision Instruments). Gelatin-coated wells without cells were used to meas- 
ure the intrinsic electrical resistance of the inserts for background subtraction. 
Measurements were performed every day for four consecutive days, with at least 
two measurements per condition. 
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Actin dynamics and Rho (kinase) activity assays. Latrunculin wash-out. ECs were 
treated with latrunculin B (100 ng ml”) for 30 min and were then washed three 
times with culture medium. The cells were fixed at the indicated time points and 
stained with phalloidin to visualize actin stress fibres. 

The F- actin/G-actin ratio. The ratio was determined for GLUL*? versus control 
ECs, in 4% paraformaldehyde-fixed cells that were permeabilized for 10 min in 
PBS with 0.2% Triton X-100 and stained with phalloidin-Alexa 488 and DNase 
I-Alexa 594 (1:200)*4. Fluorescence intensities were quantified with Image] and 
were based on grey values. On average, ten individual cells were analysed per 
experimental condition. 

RHOJ activity measurements. Cells were lysed in buffer containing 50 mM Tris, 
pH 7.6, 150 mM NaCl, 1% Triton X-100, 0.5 mM MgCh, protease inhibitors and 
0.1 xg! biotinylated CRIB-peptide. After spinning down for 4 min at 18,000g at 
4°C, 501 streptavidin-coated beads was added to the lysates. Subsequently, samples 
were rotated for 30 min at 4°C, beads were washed 4 times in the above buffer after 
which they were boiled for 5 min in reducing agent and loading buffer*®. As negative 
controls in this assay, we used lysates from RHOJ*? ECs, a streptavidin-beads 
only-condition and lysates in which the biotinylated CRIB-peptide was replaced 
by an irrelevant biotinylated protein (Fig. 4c). 

RHOA/B/C activity. This was determined with glutathione S-transferase~Rhotekin 
pull down assays following previously established protocols*®. 

ROCK activity. This was assayed by determining phosphorylation of the ROCK target 
myosin light-chain 2 on western blot or by immunostaining. Fluorescence intensities 
from immunostaining were quantified with ImageJ and were based on grey values. 
Confocal and high-resolution imaging. Confocal imaging. Imaging was per- 
formed on a Zeiss LSM 510 Meta NLO or Zeiss LSM 780 confocal microscope (oil 
objectives: x40 with numerical aperture (NA) 1.3, x63 with NA 1.4, x 100 with 
NA 1.3) with ZEN 2011 software (Carl Zeiss). Within individual experiments, 
all images across different experimental conditions were acquired with the same 
settings. 

DORA-RHOA-FRET imaging. RHOA activity was measured in living HUVECs by 
monitoring fluorescence intensities of the FRET-acceptor yellow fluorescent protein 
(YFP) and the donor cyan fluorescent protein (CFP) as previously described*”. In 
brief, a Zeiss Observer Z1 microscope, with a Chroma 510 DCSP dichroic splitter, 
two Hamamatsu ORCA-R2 digital CCD cameras and an attached dual camera 
adaptor (Zeiss) controlling a 510 DCSP dichroic mirror, was used for simultaneous 
monitoring of CFP and YFP emissions using filter sets ET 480/40 and ET 540/40m 
(Chroma Technology), respectively. To excite the CFP donor, ET 436/20x and 455 
DCLP dichroic mirror was used (Chroma). For FRET/CFP ratiometric processing, 
CFP and YFP images were processed using the MBF Image collection. The images 
were background-subtracted, aligned and a threshold was applied. Finally, the 
FRET/CEFP ratio was calculated and a custom lookup table was applied to generate a 
colour-coded image, in which white and red colours illustrate high and blue colours 
illustrate low RHOA activities. 

BiFC imaging and quantification. BiFC was evaluated using a laser scanning micro- 
scope (Fluoview FV 1000, Olympus) equipped with a UPLSAPO 60x oil objective 
(NA 1.35). Before imaging, cells were fixed with 4% (v/v) paraformaldehyde and 
stained with 4’,6-diamidino-2-phenylindole (DAPI; 1:1,000 dilution, Invitrogen). A 
488-nm laser was used for exciting eGFP, whereas DAPI was excited using a 405-nm 
laser. A DM405/488/559/635 polychroic mirror was used to guide the excita- 
tion lasers to the sample. Fluorescence images of fixed cells were acquired using 
a sampling speed of 41s per pixel. Emission light was collected at 430-470 nm 
and 500-550 nm, for DAPI and eGFP, respectively. The images were acquired with 
a pixel size of 207 nm (1,024 x 1,024 pixels). BiFC was first established in HEK 
cells expressing GS-eGFP!? and RHOJ-eGFP”” from one expression vector, with 
a construct overexpressing an unfused N-terminal eGFP half-site together with 
RHOJ coupled to the C-terminal eGFP half-site as a negative control (data not 
shown). To determine the effect of deleting the first 20 amino acids in RHOJ on 
BiFC in ECs, separate expression constructs for GS-eGFP!/”, RHOJ-eGFP?? and 
AN-RHOJ-eGFP”” were used (Extended Data Fig. 7e). Quantification of expres- 
sion efficiency was carried out using an in-house-generated routine in MATLAB. 
TIRF microscopy. An in-house-generated setup based on an inverted microscope 
(IX83, Olympus) was used to detect single molecules under total internal reflection 
(TIRF) mode. The setup was equipped with an Electron Multiplying-CCD cameras 
(ImagEM C9100-13; Hamamatsu Photonics) and an APON 60XOTIRE objective 
lens (NA 1.49, Olympus). The GS-mEos3.2 molecules were excited with a 561-nm 
line from a DPSS laser (200 mW; Coherent) and converted with a 405-nm line 
from a diode laser (Cube, 100 mW; Coherent). Before expansion, the laser lines 
were combined using a 405bcm dichroic mirror. The laser lines were guided onto 
the sample by a dichroic mirror, 2488/561/633rpc. The red fluorescence of the 
mEos3.2 was detected through a long-pass filter 572 (HQ572LP), in combination 
with a band-pass filter HQ590M40-2P. All the filters were purchased from Chroma. 
Time-lapse fluorescence images were recorded with continuous illumination at a 
62.5-Hz acquisition rate (16 ms per frame). 
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Single-particle tracking. For calculation of single-molecule coordinates, the program 
‘Localizer' running from MATLAB was used**. After localization, the positions of a 
molecule detected in consecutive frames are connected to reconstruct a trajectory 
using in-house-developed software in MATLAB. Coordinates presented in consec- 
utive frames are linked to form a single trajectory when they uniquely appear at a 
distance smaller than 856 nm (corresponding to 8 pixels). Trajectories with at least 
three steps were analysed using variational Bayes single-particle tracking analysis 
(vbSPT), a software package for analysis of single-particle diffusion trajectories, in 
which the diffusion constants switch randomly according to a Markov process”. 
For experiments involving microscopic analysis of individual (immuno-)stained 
cells, initial confocal image acquisition was done by researchers blinded to the 
experimental condition to avoid biased imaging of cells presenting a certain phe- 
notype. 
Mice. Glul®®° mice. To obtain inducible EC-specific GS-knockout mice, Glul!°*/'* 
mice“ were intercrossed with VE-Cadherin-cre"®” mice"! or with Pdgfb-cre®®”? 
mice” and named Glul’”®~®° and GlulP?°, respectively. Correct Cre-mediated 
excision of the loxP-flanked Glul segment in tamoxifen-treated Glul®-*° mice was 
confirmed by PCR analysis of genomic DNA (Extended Data Fig. 1d, e). 
Generation of Glul*/S® chimaeras. Blastocysts were collected from superovu- 
lated C57BL/6 females at post-coital day 3.5 and were cultured for 5-8 days in ES 
cell culture medium consisting of Knockout DMEM medium (Invitrogen), with 
2mM L-glutamine, FBS (Hyclone, Thermo Scientific), MEM non-essential amino 
acids 100X (Invitrogen), 0.01 mM 8-mercaptoethanol (Sigma-Aldrich), 1 mM 
sodium pyruvate (Invitrogen), 100 U ml“! penicillin, 100 j.g ml“! streptomycin and 
2,000 U ml"! leukaemia inhibitory factor (Merck, Millipore). Afterwards, the 
inner cell mass was selectively removed from the trophectoderm, trypsinized and 
replated on a mitomycin C-arrested MEF feeder monolayer. ES cells were fed every 
day and passaged every 2-4 days onto new feeder cells. Glul*/C¥? ES cells (E141B10 
ES cell line)® were injected into C57BL/6 blastocysts and high chimaeric pups were 
killed at P5 for detection of GFP in the retinal microvasculature. 
In vivo models. Analysis of dorsal dermal blood vessel network. From E11.5 to E13.5 
(where detection of the vaginal plug was designated E0.5), Glul’”™“*° pregnant dams 
were treated with tamoxifen (50 mg kg”) by oral gavage. At E16.5 they were eutha- 
nized by cervical dislocation after which embryos were dissected from the uterus. 
Yolk sacs were collected, washed with PBS and used for genotyping of the embryos. 
The embryos were fixed for 10 min in 1% paraformaldehyde before dissection of 
the dorsal skin. The epidermal and dermal layers were separated under a dissec- 
tion microscope. Dissected back skins were permeabilized overnight (0.5% Triton 
X-100, 0.01% sodium deoxycholate, 1% bovine serum albumin, 0.02% sodium 
azide) before whole-mount immunostaining with CD31. To systematically analyse 
the same region for each embryo, 1 rectangular confocal image (1,700 x 1,100 }1m) 
was taken at the anterior side of the skin specimen with the upper longer side of 
the rectangle placed on the midline. Within each rectangular picture the number 
of branch points was determined with the cell counter tool in Image] in 6 regions 
of interest (ROIs) (250 x 250,1m), three in the top half and three in the bottom half 
of the rectangle, not overlapping with the larger arteries and veins. 
Neonatal retinal angiogenesis. EC-specific Glul deletion was obtained by intraperi- 
toneal administration of tamoxifen (Sigma; 10 mg kg~'; dissolved in 1:10 ethanol: 
oil solution) once daily from P1 to P3 in Glul’®°®° or once at P2 for Glul?®°*°, For 
in vivo proliferation quantification, EdU (Invitrogen) was injected intraperito- 
neally 2 h before euthanasia. Unless stated otherwise, retinas were isolated at P5 
as previously described* and fixed in 2% paraformaldehyde for 2 h. Isolectin B4 
(IB4), EdU, NG2 and CollV stainings were performed as previously described’. 
Radial outgrowth of the vascular plexus, vascular area, branch points, number of 
filopodia and number of distal sprouts were analysed on IB4-stained retinas (see 
below) with ImageJ. Numbers of branch points and EdU* ECs were quantified in 
200 x 200-1m ROIs; per retina 12 ROIs were placed at the front of the vascular 
plexus and 8 ROIs were placed more towards the centre of the plexus. Filopodia and 
distal sprouts were quantified on ten high-magnification (63 x) images per retina, 
each representing approximately 200 1m of utmost vascular front. For analysis of 
the retinal vasculature at P21 (3 weeks old) and P42 (6 weeks old), mice underwent 
the same tamoxifen treatment regimen as for analyses at P5. In addition, different 
tissues were collected from P42 mice for endoglin and CD34 staining to study 
blood vessels in different vascular beds. 
Oxygen-induced retinopathy. Oxygen-induced ROP was induced by exposing 
C57BL/6 pups to 70% oxygen from P7-P12. Pups were then returned to normoxia 
and injected daily with 20 mg kg! MSO. At P17, pups were euthanized and eyes 
were enucleated, fixed in 4% paraformaldehyde and retinal flatmounts were stained 
for isolectin B4**. MSO-treated mice retained normal behaviour notwithstanding 
observable weight loss. Mosaic tile images were captured using the inverted Leica 
DMI6000B epifluorescence microscope (Leica) and analysis of the vascular tuft area 
(the complete retina was analysed, no ROIs were used) and the vaso-obliterated 
area was performed with Image] software and are expressed as percentage of the 
total retinal area. 


Corneal (micro-)pocket assay. This assay, to induce neovascularization of the 
avascular cornea, was performed as previously described“. In brief, in the eyes 
of eight-week-old C57BL/6 mice, a lamellar micropocket was dissected towards 
the temporal limbus to enable the placement of a basic fibroblast growth factor 
(bFGF)-containing pellet on the corneal surface. Five days after implanting the 
pellets, the mice were euthanized, the eyes were enucleated and the corneas were 
excised and fixed in 70% ethanol before CD31 antibody staining. After staining, 
the corneas were flat-mounted and imaged on a Zeiss LSM 780 confocal micro- 
scope. CD31* area was measured in ImageJ after thresholding the signal and 
is expressed as percentage of total cornea area. Production of the pellets was 
carried out as previously described“. The pellets contained 20 ng bFGF and the 
concentration of MSO in the initial solution from which the pellets were made 
was 10 mM. 

Imiquimod-induced skin inflammation. Ten-week-old female BALB/C mice 
received a daily topical dose of 5% imiquimod cream (62.5 mg) on their shaved 
backs for four days to induce skin inflammation’. One hour after each adminis- 
tration of the cream, the same skin area was treated either with Vaseline jelly or 
Vaseline jelly containing MSO (low dose, 20 mg kg”; or high dose, 40 mg kg’). 
The MSO treatment did not affect the body weight of the mice. Skins were col- 
lected and fixed in 4% paraformaldehyde. Paraffin sections of skins were stained 
for CD105 (R&D Systems) and haematoxylin and eosin. Images were captured 
with a Leica DMI6000B microscope (Leica Microsystems). Per mouse, ten images 
representing different locations along the total length of the skin specimen were 
analysed for CD105* area. 

Miles vascular permeability assay. Eight-week-old female BALB/C mice were 
treated for three consecutive days with 20 mg kg~! day"! MSO or with vehicle 
before injection with 300 11 0.5% Evans blue dye. The inflammatory irritant mus- 
tard oil (0.25 ml allyl isothiocyanate in 4.75 ml mineral oil) was applied on one of 
the ears with a cotton swab to induce vascular permeability. Mineral oil as a control 
was applied on the other ear. After 15 min, mustard oil or mineral oil was again 
applied on the ear for 30 min, after which the circulation was flushed with saline 
for 3 min under complete anaesthesia and mice were perfused with 1% paraform- 
aldehyde in 50 mM citrate buffer (pH 3.5) for 2 min. Ears were cut and minced in 
formamide and incubated at 55°C overnight to extract the Evans blue from the 
tissue. Quantification of the dye was performed by a spectrophotometrical optical 
density measurement at 620 nm. 

Haematological profiling in six-week-old mice. Profiling was performed with a Cell 
Dyn 3700 device (Abbott Diagnostics) according to the manufacturer's guide- 
lines. Plasma measurements for different liver and inflammation parameters were 
performed in the clinical laboratory of UZ Leuven. Prior randomization was not 
applicable for any of the above mouse models given that all animal treatments 
were performed in baseline conditions. No statistical methods were used to pre- 
determine the sample size. For all mouse experiments, data analysis was carried 
out by researchers blinded to the group allocation. All animal procedures were 
approved by the Institutional Animal Care and Research Advisory Committee of 
the University of Leuven. 

In silico screening for palmitoylation sites. The human RHOJ protein sequence 
was screened for putative palmitoylation sites on the SwissPalm website” entering 
‘RHOJ as the protein name. 

Modelling and simulations. The GS models were built starting from X-ray crystal- 
lographic structures retrieved from the Protein Data Bank (entry 2OJW for human 
GS and 1FPY for bacterial GS). All simulations were run with Gromacs 5.1.4° 
and the Amber FF14SB* force field, while palmitoyl-CoA was parametrized with 
GAFF and the point charges were calculated with Gaussian 09*” at the Hartree— 
Fock level with a 6-31G* basis set. The different models were then embedded in a 
TIP3P water box, and counter ions were added to ensure overall charge neutrality. 
An initial 2,000 steps of steepest descent and 500 steps of conjugated gradient were 
applied to minimize the geometry and remove steric clashes, followed by 10 ns of 
isothermal-isobaric (NPT) equilibration. The Berendsen barostat was applied to 
keep the pressure around 1 atm, while the temperature of 300 K was maintained 
throughout all the simulations with the V-rescale algorithm**. Molecular dynamics 
production runs (500-ns long) were carried out for all the systems in the canon- 
ical (NVT) ensemble, for a cumulative total of 2.5 1s. The particle mesh Ewald 
(PME)-Switch algorithm was used for electrostatic interactions with a cut-off of 
1 nm, and a single cut-off of 1.2 nm was used for Van der Waals interactions. Four 
simulations for human GS and two for the GS of Salmonella typhimurium were 
run by placing the CoA moiety close to the adenosine binding site and allowing 
different initial positions for the palmitoy] tail. The CoA head invariably docked 
and remained tightly bound to the adenine binding site in all simulations. Among 
these, two favourable alternative arrangements (Extended Data Fig. 8b) for the tail 
were identified in both systems. In one of these conformations, the beginning of the 
palmitate tail (from the point of view of the CoA moiety) approaches very close to 
the conserved CYS209 (human residue numbering, conformation A in Extended 
Data Fig. 8b, details in Extended Data Fig. 8c), and in the other conformation 
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(conformation B in Extended Data Fig. 8b, details in Extended Data Fig. 8d) it 
approaches the conserved Ser65 and 75. 

Multiple sequence alignments. A multiple sequence alignment of the GS protein 
across different species was performed with the Basic Local Alignment Search 
Tool (BLAST). The algorithm matches sequences according to local similarity, by 
optimizing their maximal segment pair score (MSP). The 100 matches with the 
highest identity to the Homo sapiens sequence surrounding amino acid C209 were 
taken from the UniProtKB/Swiss-Prot refined database. 

Statistical analysis. Data represent mean + s.e.m. of pooled experiments unless 
otherwise stated. Scatters in bar graphs represent the values of independent exper- 
iments or individual mice. In cases for which individual values are very similar, 
scatter points overlap and may no longer be visible as individual points. n values 
represent the number of independent experiments performed or the number of 
individual mice phenotyped. Statistical significance between groups was calculated 
with one of the following methods: for comparisons to point-normalized data, a 
two-tailed one-sample t-test was used in GraphPad Prism7. For pairwise compari- 
sons, two-tailed unpaired t-tests were used in GraphPad Prism7. For multiple com- 
parisons within one dataset, one-way ANOVA with Dunnett's multiple comparison 
(comparing every mean with the control mean rather than comparing every mean 
with every other mean) was used in GraphPad Prism7. Mixed-model statistics 
(this test does not assume normality or equal variance) was used with the experi- 
ment as a random factor only in cases for which confounding variation in baseline 
measurements between individual EC isolations (for each experiment, ECs were 
freshly isolated from individual human umbilical cords) or mouse litters precluded 
the use of the above statistical tests. For this, R and the Ime4 package were used; 
P values were obtained with the Kenward—Roger F-test for small mixed-effect 
model datasets. In the most severe cases, the individual data points (each data point 
being the mean of the technical replicates within an experiment or an individual 
mouse) in the bar graphs have been colour-coded per experiment or per litter to 
show the baseline variation. The sample size for each experiment was not pre- 
determined. A P value <0.05 was considered significant. 

Reporting summary. Further information on research design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Figures 1, 4, 5 and Extended Data Figs. 1, 7, and 8 have asso- 
ciated raw data (uncropped blots and/or gel pictures) in Supplementary Fig. 1. 
Figures 1, 2 and Extended Data Figs. 1, 4 have associated raw data (Excel files) for 
all bar graphs representing data from experiments involving mouse models. For 
the molecular modelling of palmitoyl-CoA docking into GS, models and trajecto- 
ries are available on Figshare (doi: 10.6084/m9.figshare.6575438). Any additional 
information required to interpret, replicate or build upon the Methods or findings 
reported in the manuscript is available from the corresponding author upon request. 
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Extended Data Fig. 1 | See next page for caption. 
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Extended Data Fig. 1 | GLUL knockout impairs vessel sprouting. 

a, GLUL mRNA levels in HUVECs (n=9 donors), lung ECs (n=5), 
colon ECs (n= 4), liver ECs (n = 3), human umbilical artery ECs 
(HUAECs) (n= 2) and human blood outgrowth ECs (BOECs) (n = 2); 
(mean + s.e.m.; *P <0.05 versus HUVEC, Student’s t-test) and in HEPG2 
cells (mean +s.e.m.; n = 3; *P <0.05 versus HUVEC, Student'’s t-test). 

b, c, Western blot of GS protein levels in HUVECs and HEPG2? cells in 
medium containing 0.6 mM glutamine (+) or 0.025 mM glutamine (—) 
(b), and in isolated mouse liver ECs (mLiECs) and mouse astrocytes (c) 
(representative immunoblots of two independent experiments are shown). 
d, e, Genomic organization of the JoxP-flanked Glul allele before and after 
Cre-mediated excision (d) and correct recombination of the /ox allele 

(L) in Glul’®CX° and GlulP£CK° mice upon tamoxifen (Tam) treatment, as 
assessed by genomic DNA PCR (e; the PCR to amplify the JoxP-flanked 
Glul allele (lox) or to amplify the cre-recombined allele (A) were run in 
separate reactions but loaded in the same lane; the gel picture shown is 
representative of all control, Glul”™CX° and GlulPEC®° mice used in this 
study). f, Quantification of branchpoints at the rear of the plexus in 
Glul”=CX° mice (mean + s.e.m.; 1 = 10 mice for Glul”®“*° and 11 for WT 
controls from 3 litters; *P <0.05 versus WT littermates, mixed-models R 
statistics). g, Pericyte coverage of retinal microvessels in WT and Glul’®CX° 
littermates determined by NG2 staining and shown as the NG2* area as a 
percentage of the vessel area (mean + s.e.m.; n= 4 mice for WT and 3 for 
Glul’™C*° from 1 litter; NS, P> 0.05 versus WT, Student’s t-test). 

h, Reduced complexity of the retinal vascular front in P5 Glul’ECKO 
compared with WT mice, determined by the number of branches on distal 
sprouts (mean +s.e.m.; 1 = 13 mice for WT and 21 for Glul”=C*° from 5 
litters; *P <0.05 versus WT, Student's t-test). i, Quantification of EdU* 
ECs at the rear of the plexus (mean + s.e.m.; 1 = 12 mice for WT and 22 
for Glul’=C*° from 4 litters; NS, P > 0.05 versus WT littermates, Student’s 
t-test). j-m, IB4 staining of P5 retinal vascular plexuses from WT (j) and 
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GlulP=C*° (k) mice (representative pictures with magnification shown in 
the inset; A, artery; V, vein) and quantification of branch points at the 
front (1) and the rear (m) of the plexus (mean +s.e.m.; n = 10 mice for 
WT and 18 for GlulPEC*° from 4 litters; *P < 0.05 versus WT littermates, 
Student’s t-test). n-u, IB4 staining of the retinal microvasculature of 
three-week-old (P21) (n, 0) and six-week-old (P42) (r, s) WT and 
Glul”™°*° littermates (A, artery; V, vein). The lower left insets display a 
higher magnification of the IB4-stained superficial plexus, whereas the 
lower right insets display a higher magnification of the deep plexus. The 
corresponding quantification of the vascular area (p, t) and the branch 
point density (q, u) in the superficial and the deep layer is also shown 
(mean +s.e.m.; n =8 mice for WT and 8 for Glul”?“*° at P21, from 

two litters; n= 10 mice for WT and 14 for Glul’®“° at P42, from four 
litters; NS, P> 0.05 versus WT, Student’s t-test). v-ag, Representative 
micrographs of heart (v, z), liver (w, aa) and kidney (x, ab) sections from 
WT and Glul’®©*° littermates immunostained for the EC marker endoglin 
and of lung (y, ac) sections immunostained for the EC marker CD34 

and corresponding quantifications of endoglin* (ad, heart; ae, liver; af, 
kidney) or CD34* (ag, lung) vascular area (mean + s.e.m.; n=5 mice (4 
for heart) for WT and 7 (6 for heart) for Glul’?“*°, from two litters, NS, 
P>0.05 versus WT, Student's t-test). ah—ai, Images of flat-mounted retinas 
from control (ah) and MSO-treated (ai) ROP mice (vaso-obliterated 
area in white). Images shown are representative of 7 (ah) and 6 (ai) mice. 
Exact P values: HUVEC versus lung ECs: 0.0278; HUVEC versus colon 
ECs: 0.1086; HUVEC versus liver ECs: 0.3334; HUVEC versus HEPG2: 
<0.0001 (a); <0.0001 (f); 0.3491 (g); <0.0001 (h); 0.8247 (i); 0.0012 (1); 
0.050 (m); superficial: 0.1218; deep: 0.1720 (p); superficial: 0.9995; deep: 
0.4289 (q); superficial: 0.9792; deep: 0.6602 (t); superficial: 0.7979; deep: 
0.1275 (u); 0.9021 (ad); 0.2279 (ae); 0.7647 (af); 0.3614 (ag). Scale bars: 
200 1m (j, k, n, 0, r, s), 204m (v-ac), 1 mm (ah-ai). For gel source images, 
see Supplementary Fig. 1. 
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Extended Data Fig. 2 | Effects of silencing and pharmacological 
inhibition of GS on EC viability and central metabolism. a, GLUL 
mRNA levels in control ECs and ECs transduced with two different non- 
overlapping shRNAs targeting GLUL (GLULK”! and GLULK™; GLUL*”! is 
used in the experiments in the text and is denoted GLUL*) or transfected 
with scrambled siRNA (SCR) or siRNA targeting GLUL (siGLUL). Data 
are expressed as a percentage of the respective control, denoted by the 
horizontal dotted line (mean + s.e.m.; 1 = 28 independent experiments 
for GLUL‘”!, n =3 independent experiments for GLULK”? and n=9 
independent experiments for siGLUL; *P <0.05 versus the respective 
control; one-sample t-test). b, c, Quantification of number of sprouts (b) 
and total sprout length (c) for spheroid-sprouting assays with GLUL*? 
ECs and GLUL*” ECs expressing a shRNA-resistant GLUL mutant 
(rGLUL) (mean +s.e.m.; n = 3 independent experiments; *P <0.05 and 
NS, P > 0.05 versus control; one-way ANOVA with Dunnett’s multiple 
comparison versus control). d, Viability of control (Ctrl) and GLUL‘? ECs 
as measured by lactate dehydrogenase (LDH) release assay (mean + s.e.m.; 
n= 3 independent experiments; NS, P > 0.05 versus control, one-sample 
t-test). e, Intracellular levels of reactive oxygen species measured by CM- 
H,DCFDA staining (mean + s.e.m.; n =3 independent experiments; NS, 
P> 0.05 versus control, Student's t-test). f, Energy charge measurement 
((L[ATP] + 1/2[ADP]) / ([ATP] + [ADP] + [AMP])) in GLUL*? and 
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control ECs (mean + s.e.m.; n= 3 independent experiments; NS, P > 0.05 
versus control, Student's t-test). g, Ratio of oxidized glutathione (GSSG) 
over total glutathione levels (GSSG/(GSH + GSSG)) in GLUL® and 
control ECs (mean + s.e.m.; n= 4 independent experiments; NS, P > 0.05 
versus control, Student’s t-test). h, NADP/NADPH ratio in GLUL*? and 
control ECs (mean +s.e.m.; n=5 independent experiments; NS, P > 0.05 
versus control, one-sample t-test). i-k, Effect of GLUL knockdown on 
major metabolic fluxes including glycolysis (i), glucose oxidation (j) and 
glutamine oxidation (k) (mean + s.e.m.; n =3 independent experiments 
for i, n=5 for j and n=4 for k; NS, P > 0.05 versus control, one-sample 
t-test). 1, m, Oxygen consumption rate (OCR) in control, MSO-treated 
and GLUL*” ECs in basal state and after injection of oligomycin, FCCP 
and antimycin A (1) (mean + s.e.m.; n =3 independent experiments), 
and calculation of OCRgas, OCRarp and maximal respiration (m) 

(mean + s.e.m.; n=3 independent experiments). Exact P values: GLULX”!: 
<0.0001; GLULK™: <0.0001; siGLUL: <0.0001 (a); control versus 
GLUL®?: 0.0147; control versus GLULK? + rGLUL™: 0.9824 (b); control 
versus GLUL*?: 0.0083; control versus GLULK? + rGLUL”?: 0.6528 

(c); 0.5717 (d); 0.8206 (e); 0.3715 (f); 0.4398 (g); 0.9291 (h); 0.4691 (i); 
0.6643 (j); 0.6786 (k). CM-DCEF, CM-H,DCFDA; OCRaas, basal oxygen 
consumption rate; OCRarp, ATP-generating oxygen consumption rate; 
RFU, relative fluorescence units. 
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Extended Data Fig. 3 | GLUL knockdown reduces EC motility. a, Wound 
closure in control and GLULK” EC monolayer scratch assays with or 
without pretreatment with mitomycin C (mean +s.e.m.;n=7 and 5 
independent experiments with and without mitomycin C, respectively; 

*P <0.05 versus corresponding control; Student's t-test). b, Quantification 
of lamellipodial area (as a percentage of total cellular area) in control and 
GLULK” ECs (mean +s.e.m.; n =3 independent experiments; *P <0.05 
versus control; Student's t-test). c, Wound closure in monolayer scratch 
assays with SCR- and siGLUL-transfected ECs (mean s.e.m.;n=5 


independent experiments; *P <0.05 versus SCR; Student's t-test). 

d, Quantification of lamellipodial area (as a percentage of total cellular 
area) in SCR- and siGLUL-transfected ECs (mean +s.e.m.;n=5 
independent experiments; *P <0.05 versus SCR; Student's t-test). 

e, Proliferation of SCR- and siGLUL-transfected ECs, as measured by [*H] 
thymidine incorporation into DNA (mean +s.e.m.; n = 3 independent 
experiments; NS, P > 0.05 versus SCR; Student's t-test). Exact P values: 
control versus GLUL*”?: 0.0290; control versus GLULK? + MitoC: 0.0223 
(a); 0.0088 (b); 0.0407 (c); 0.0083 (d); 0.4335 (e). 
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Extended Data Fig. 4 | Effects of GLUL silencing on cytoskeleton and 
barrier function. a-h, Images of control (a, c, e, g) and GLUL®? (b, d, f, h) 
ECs after staining for «-tubulin (a, b), F-actin (c, d) and nuclear staining 
(e, f); g and h show merged images. The images shown are representative 
of 3 independent experiments. i-k, Representative images of phalloidin 
(F-actin) + Hoechst-stained liver ECs 6 h after isolation from control 

(i) and MSO-treated (j) mice, and corresponding quantification of 
F-actin levels (k) (mean +s.e.m.; n =5 mice per group; *P <0.05 versus 
control, Student’s t-test). 1-n, Representative images of phalloidin-stained 
confluent monolayer control (1) and GLUL*? (m) ECs aligning a scratch 
wound, and quantification of F-actin levels (n) (mean +s.e.m.; n=5 
independent experiments; *P <0.05 versus control, Student’s t-test). 

0, Quantification of the length of discontinuous and continuous VE- 
cadherin-stained junctions in control and GLUL*” ECs (mean +s.e.m.; 
n=4 independent experiments; *P <0.05 versus control, Student's t-test). 
p, Quantification of VE-cadherin gap size index in control and GLUL? 
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EC monolayers (mean + s.e.m.; n = 4 independent experiments; *P <0.05 
versus control, Student's t-test). q-v, Corresponding representative 
images of monolayer control (q, s, u) and GLUL*® (r, t, v) ECs stained 

for VE-cadherin (q, r, u, v) and F-actin (s, t, u, v). Yellow arrows in 

r point to discontinuous VE-cadherin junctions and yellow asterisks 
indicate intracellular gaps. w, Quantification of transendothelial electrical 
resistance in control and GLUL* EC monolayers (mean +s.e.m.; 1 =4 
independent experiments; *P <0.05 versus control, Student’s t-test at each 
time point). x-z, Quantification (x) of Evans blue dye extracted from the 
ears of control and MSO-treated mice, induced by topical application of 
mustard oil (n = 4 mice for each condition, *P <0.05; Student’s t-test), and 
representative pictures of the leakage of Evans blue dye into the ear tissue 
in control (y) and MSO-treated (z) mice. Exact P values: 0.0030 (k); 0.0036 
(n); continuous control versus GLUL®”: 0.0005; discontinuous control 
versus GLUL*®: 0.0005 (0); 0.0356 (p); 0.0181 (w); 0.0002 (x). Scale bars: 
20 1m (a-h, 1, m), 101m (i, j, q-v). AU, arbitrary units. 
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Extended Data Fig. 5 | See next page for caption. 
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Extended Data Fig. 5 | Enzymatic activity of GS and its role in EC 
migration. a, Scheme for the "NH," labelling of glutamate and glutamine 
with unlabelled carbons (blue) and labelled nitrogens (red). b, &N 
incorporation into glutamine (measured as the percentage of isotope 
enrichment in glutamine either as M + 1 (singly labelled) or M + 2 
(doubly labelled), 30 min after adding *NH,"*) in medium with dialysed 
serum and different glutamine concentrations (mean + s.e.m.; n=3 
independent experiments; one-way ANOVA with Dunnett’s multiple 
comparisons versus 4 mM; *P <0.05). c, aN incorporation into glutamate 
(measured as the percentage of isotope enrichment in 

M + 1) and glutamine (measured as the percentage of isotope enrichment 
in M + 1 (singly labelled) and M + 2 (doubly labelled)), 30 min after 
adding increasing concentrations of *NH,Cl (mean +s.e.m.;n=3 
independent experiments). d, Scheme of glutamine labelling from [U-!?C] 
glutamate with unlabelled nitrogens (blue) and labelled carbons (red). 

e, Contribution of labelled [U-'?C] glutamate to intracellular glutamine 

at various glutamine concentrations (percentage of isotope enrichment 

in M + 5 glutamine and glutamate, 30 min after adding the tracer) 

(mean + s.e.m.; n = 3 independent experiments; one-way ANOVA with 
Dunnett’s multiple comparisons versus 4 mM; *P <0.05). f, Scheme for the 
contribution of carbons from [U-*C] glucose to glutamine, with labelled 
carbons (red) and unlabelled carbons (blue). Incorporation is shown after 
one turn of the tricarboxylic acid (TCA) cycle. g, Total contribution of 
carbons from [U-'°C] glucose to «-ketoglutarate, glutamate and glutamine 
in ECs in medium with or without glutamine, 48 h after adding the tracer 
(mean +s.e.m.; n= 3 independent experiments; *P <0.05 versus total 
contribution in glutamine at 0.6 mM external glutamine, one-way ANOVA 
with Dunnett’s multiple comparisons). h, Incorporation of !°N into 
glutamine (measured as the percentage of isotope enrichment in M + 1 
(singly labelled) and M + 2 (doubly labelled), 30 min after adding '"NH,4*) 
in ECs and HEPG? cells (mean + s.e.m.; n = 4 independent experiments. 
ND, not detected). i, !?C-glutamine uptake kinetics in control, 
MSO-treated and GLUL‘? ECs and subsequent conversion to glutamate. 
See Methods for explanation of the different time points. Data are for the 
M + 5 isotopomer, as a percentage of the total intracellular glutamine or 
glutamate pool (mean + s.e.m.; n= 3 independent experiments, except 

for 30 min for which n= 1 experiment; no statistical differences between 
control, MSO-treated and GLUL®? ECs were observed for glutamine or for 
glutamate; one-way ANOVA with Dunnett's multiple comparison versus 
control at each time point; no statistical analysis was performed at 30 min). 
j, 4C-glutamine uptake in control and GLUL®? ECs (mean +s.e.m.; n=5 
independent experiments; NS, P > 0.05 versus control, one-sample f-test). 


k, Ratio of intracellular glutamine and glutamate levels in control and 
GLUL* ECs (mean £s.e.m.; n=3 independent experiments; NS, P > 0.05 
versus control, Student's t-test). 1, Velocity measurement of control and 
GLUL*® ECs at different glutamine concentrations (mean + s.e.m.; 

n=4 independent experiments; *P <0.05 versus corresponding control, 
mixed-models R statistics). m, n, Effect of glutamine concentration on 
sprout number (m) and total sprout length (n) in control and GLULKP 
spheroids (mean + s.e.m.; n= 3 independent experiments; *P <0.05 
versus corresponding control, mixed-models R statistics). 0, p, Number of 
sprouts per spheroid (0) and total sprout length (p) in control and MSO- 
treated EC spheroids (mean + s.e.m.; n =3 independent experiments; 

*P <0.05 versus control, paired Student's t-test). q—s, Effect of MSO- 
treatment on EC motility parameters: wound closure of mitomycin 
C-treated ECs (q) (mean +s.e.m.; n= 11 independent experiments; 

*P <0.05 versus control, Student’s t-test), lamellipodial area (r) 

(mean + s.e.m.; m= 10 independent experiments; *P <0.05 versus control, 
paired Student’s t-test) and F-actin levels, 1 h after latrunculin wash-out (s) 
(mean + s.e.m.; n = 4 independent experiments; *P <0.05 versus control, 
one-sample t-test). t, [7H] Thymidine incorporation in control and 
MSO-treated ECs (mean + s.e.m.; 1 = 3 independent experiments; NS, 
P> 0.05 versus control, one-sample t-test). Exact P values: M + 1 0.025 mM 
versus M + 14 mM: 0.0096; M + 10.6 mM versus M + 1 4 mM: 0.1206; 
M+ 20.025 mM versus M + 2 4 mM: 0.0839; M + 2 0.6 mM versus 
M+24mM: 0.9921 (b); Glu M + 5 0.6 mM versus GluM + 54 mM: 
0.9372; Glu M + 5 0.025 mM + MSO versus Glu M + 5 4 mM: 0.0034; 
Glu M+ 5 0.025 mM versus Glu M+ 5 4 mM: 0.0215; Gln M + 50.6 mM 
versus GIn M + 5 4 mM: 0.9297; Gln M + 5 0.025 mM + MSO versus 

Gln M+ 54mM: 0.9961; Gln M+ 5 0.025 mM versus GinM+54mM: 
0.0268 (e); a-keto 0.6 mM versus Gln 0.6 mM: 0.0001; Glu 0.6 mM 

versus Gln 0.6 mM: 0.0001; Gln 0 mM versus Gln 0.6 mM: 0.0285 (g); 

Gln 0.5 min: control versus MSO: 0.4846; control versus GLUL®?: 0.5904; Gln 
10 min: control versus MSO: 0.6709; control versus GLUL®?: 0.6910; 

Gln 20 min: control versus MSO: 0.5896; control versus GLUL®?: 0.6784; 
Glu 0.5 min: control versus MSO: 0.9774; control versus GLUL®”: 0.8810; 
Glu 10 min: control versus MSO: 0.0502; control versus GLUL®?: 0.9598; 
Glu 20 min: control versus MSO: 0.9782; control versus GLUL®?: 0.7783 

(i); 0.6623 (j); 0.6704 (k); control versus GLUL*? 0.1 mM: 0.0054; control 
versus GLUL*? 0.6 mM: 0.0247 control versus GLULK? 2 mM: 0.0017 (1); 
control versus GLULK? 0.6 mM and 10 mM: <0.0001 (m); control versus 
GLUL*? 0.6 mM and 10 mM: <0.0001 (n); 0.0313 (0); 0.0075 (p); 0.0019 
(q); 0.0116 (r); 0.0091 (s); 0.5110 (t). a-keto, a-ketoglutarate; GDH, 
glutamate dehydrogenase. 
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Extended Data Fig. 6 | Rescuing the phenotype associated with GLUL‘? 
in vitro. a, Schematic of the DORA-RHOA-FRET biosensor, depicting 
from N- to C-terminal the circular permutated RHOA effector protein 
kinase N (cpPKN), the dimeric circular permutated Venus (dcpVen), the 
ribosomal protein-based linkers (L9), the dimeric Cerulean3 (dCer3) 

and RHOA. b-m, Representative images of control (b-d), MSO-treated 
(e-g), GLUL® (h-j) and RHOJ® (k-m) ECs after staining for F-actin 
(phalloidin) (b, d, e, g, h, j, k, m) and pMLC (c, d, f, g, i, j, 1, m). 

n, Quantification of the pMLC immunoreactivity (mean + s.e.m.;n=5 
independent experiments; *P <0.05 versus control, one-sample t-test). 
o-t, Representative images of control (0, q, s) and GLULK? (p, r, t) EC 
spheroids treated with vehicle (0, p) or the ROCK inhibitors Y27632 

(q, r) or fasudil hydrochloride (Fasu.; s, t). u, v, Quantification of the 
number of sprouts per spheroid (u) and sprout length (v) (mean £s.e.m.; 
n=3 independent experiments; *P <0.05 and NS, P > 0.05 versus 
untreated control, one-way ANOVA with Dunnett’s multiple comparisons 
versus untreated control). w, Quantification of the lamellipodial area 

in vehicle- or fasudil-hydrochloride-treated control and GLULK? ECs 
(mean + s.e.m.; n= 6 independent experiments; *P <0.05 and NS, 

P> 0.05 versus untreated control, one-way ANOVA with Dunnett’s 
multiple comparisons versus untreated control). x, Quantification of 

the lamellipodial area in vehicle-, ML7- or peptide-18 (pep.18)-treated 
GLUL* and control ECs (mean + s.e.m.; n = 4 independent experiments 
of which 3 experiments included the ML7-treatment; *P <0.05 versus 
untreated control, one-way ANOVA with Dunnett’s multiple comparisons 
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versus untreated control). y, Scratch wound closure in vehicle-, ML7- 

or peptide-18-treated GLUL®? and control ECs (mean +s.e.m.; n=3 
independent experiments; *P <0.05 versus untreated control, one-way 
ANOVA with Dunnett’s multiple comparisons versus untreated control). 
z, Fold changes (versus untreated control ECs) in F-actin levels from 
phalloidin-stained vehicle-, ML7- or peptide-18-treated GLUL®” ECs 
(mean + s.e.m.; n =4 independent experiments of which 3 included the 
peptide 18-treatment; *P <0.05 versus untreated control, one-sample 
t-test). aa, Fold changes (versus untreated control ECs) in pMLC levels 
from pMLC-immunostained vehicle-, ML7- or peptide-18-treated 
GLUL*® ECs (mean +s.e.m.; 1 = 4 independent experiments of which 3 
included the peptide 18-treatment; *P <0.05 versus untreated control, 
one-sample t-test. Exact P values: MSO: 0.0372; GLUL®?: 0.0060; RHOJ®?: 
0.0051 (n); GLUL*? versus control: 0.0045; Fasu. versus control: 0.9596; 
GLUL¥® + Fasu. versus control: 0.8857 (u); GLUL*? versus control: 
0.0199; Fasu. versus control: 0.8309; GLULK? + Fasu. versus control: 
0.9327 (v) GLUL*® versus control: 0.0074; Fasu. versus control: 0.5906; 
GLUL* + Fasu. versus control: 0.9900; (w); GLUL*? versus control: 
0.0011; GLULK” + ML7 versus control: 0.0079; GLULK? + pep.18 versus 
control: 0.0017 (x); GLUL*? versus control: 0.0034; GLULK? + ML7 versus 
control: 0.0022; GLUL®? + pep.18 versus control: 0.0040 (y); GLUL*?: 
0.0058; ML7: 0.0072; pep.18: 0.0888 (z); GLUL*®: 0.0369; ML7: 0.0021; 
pep.18: 0.1672 (aa). Scale bars: 20|1m (b-m), 100 1m (0-t). For gel source 
images, see Supplementary Fig. 1. 
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Extended Data Fig. 7 | See next page for caption. 
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Extended Data Fig. 7 | Rho GTPase localization and interaction with 
GS. a, Co-IP assays showing no detectable interaction between GS and 
RHOA or RHOC (red asterisk indicates a non-specific band (also present 
in the IgG controls and unaffected by silencing of RHOA or RHOC)). 
Image shown is representative of 3 independent experiments. b, Co-IP 

of overexpressed GLUL and RHOJ-eGFP or AN20-RHOJ-eGFP in ECs. 
Quantifications are mean + s.e.m.; n= 4 independent experiments; 

*P <0.05, one-sample t-test. In some of the experiments, the expression 
of AN20-RHOJ-eGFP was lower than the expression of RHOJ-eGFP. 

To correct for this, densitometric quantification was performed and 
signals in immunoprecipitation lanes were normalized to input signals. 

c, Immunoblotting for RHOA and RHOC on cytosolic (c) and membrane 
(m) fractions of ECs with NaK as membrane marker and GAPDH 

as cytosolic marker. Image shown is representative of 3 independent 
experiments. d, BiFC assay with GS coupled to the N-terminal half of 
eGFP, and RHOJ coupled to the C-terminal half of eGFP. Only when GS 
and RHOJ are in close proximity do the two eGFP half-sites complement 
each other and form a functional eGFP. e, Percentage of ECs displaying 
BiFC upon overexpression of GLUL-eGFP!”? and RHOJ-eGFP”” or 
GLUL-eGFP"” and AN20-RHOJ-eGFP?”. Data are mean +s.e.m.; n= 3 
independent experiments; *P <0.05; Student’s t-test. f, Schematic of 
SPT-PALM imaging under TIRF illumination with the plasma membrane 
depicted at the top. The TIRF region is bright (whereas the part outside 
the TIRF region is greyed out) and contains the plasma membrane and 

its immediately adjacent space (not shown at exact relative dimensions). 
Weight and number of arrowheads represent the velocity of single particles 
(the photoswitchable fluorescent protein (PSFP) or the PSFP coupled to 
the protein of interest (here GS)). The PSFP is activated upon entry into 
the TIRF region and is colour-coded differently inside and outside of 

the TIRF region. PSFP-GS displays reduced velocity in the TIRF region, 
presumably because of palmitoylation and membrane association of 

GS. g, Scheme for the in-cell labelling of proteins with clickable alkyne- 
containing palmitoylation probes and subsequent biotin-azide clicking. 

X represents a palmitoylated protein, N; is the biotin-coupled azide. 

h, i, Rate of CoA release from palmitoyl-CoA as a readout for recombinant 
human GS autopalmitoylation while varying either the doses of palmitoyl- 
CoA (h) or the amounts of recombinant GS (i) (mean +s.e.m.; n=4 
independent experiments for h and n=5 for i; *P <0.05, one-way 
ANOVA with Dunnett’s multiple comparisons versus 01M palmitoyl-CoA 
or versus 0.5 1g recombinant GS). j, Representative GS immunoblot (of 

3 independent experiments) for the binding of recombinant human GS 

to palmitoyl-CoA agarose. IF, input fraction; FT, flow through; W8, wash 
fraction 8; SDS is the eluate. k-m, Representative images of RHOJ-eGFP 
localization in ECs under vehicle-treatment (k) or treatment with 2BP 

(a pan-palmitoylation inhibitor) (1). Red arrowheads indicate eGFP signal 
at membrane ruffles, which was quantified as the percentage of the total 
cellular area (m) (mean +s.e.m.; n =4 independent experiments; *P <0.05 
versus vehicle-treated, paired Student’s t-test). n-p, Representative images 
of ECs overexpressing wild-type RHOJ-eGFP (n), RHOJ-eGFP encoding 
the point mutation C3A (0) or RHOJ-eGFP encoding the point mutation 
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C11A (p). Red arrowheads indicate RHOJ at the plasma membrane. 

ECs that are not completely in the field of view have been masked out in 
blue. q, Quantification of the RHOJ-eGFP positive area at the plasma 
membrane as a percentage of the total cell area. Data are mean +s.e.m.; 
n=5 independent experiments; *P <0.05; one-way ANOVA with 
Dunnett’s comparison versus wild-type RHOJ. r, RHOJ immunoblotting 
on membrane versus cytosolic fractions from ECs overexpressing wild- 
type RHOJ-eGFP (RHOJ’"), RHOJ-eGFP encoding the point mutation 
C3A (RHOJ“4) or RHOJ-eGFP encoding the point mutation C11A 
(RHOJ“4), with NaK as membrane marker and GAPDH and a-tubulin 
as cytosolic markers. s, Densitometric quantification of RHOJ/NaK as 
determined in r. Data are mean + s.e.m.; n = 6 independent experiments; 
*P <0.05; one-sample f-test. t, RHOJ activity in ECs under treatment with 
vehicle or 2BP (blots are representative of 3 independent experiments; 
densitometric quantification in arbitrary units is mean + s.e.m.; *P <0.05, 
paired Student's t-test versus vehicle-treated). u, RHOJ immunoblotting 
of control and GLUL®” ECs overexpressing RHOJ (RHOJ) subjected to 
acyl-resin-assisted capture. The cleaved bound fraction (cBF) represents 
palmitoylated RHOJ. IF is the input fraction, whereas the cleaved 
unbound fraction (cUF) and the preserved bound fraction (pBF) are 
controls showing the depletion of RHOJ from the thioester-cleaving 
reagent and the near absence of non-specific binding of RHOJ to the 

resin (see Methods). Densitometric quantification of cBF/IF is shown 
(mean + s.e.m.; n = 3 independent experiments; *P <0.05, one-sample 
t-test versus control). v, Left, autopalmitoylation enables endothelial GS to 
interact directly (or indirectly) with the Rho GTPase RHOJ and to sustain 
the palmitoylation, membrane localization and activity of RHOJ (reflected 
by GTP binding). RHOJ activity then sustains normal EC migration and 
lamellipodia formation, and keeps actin stress-fibre formation at levels 
that promote normal EC migration and vessel branching in vivo. Through 
mechanisms that are not completely understood, active RHOJ inhibits 
signalling of the RHOA/B/C-ROCK-(p)MLC pathway (itself known to 
promote stress-fibre formation). The relative contribution of a direct effect 
of RHOJ on migration versus the indirect effect through RHOA/B/C- 
ROCK-(p)MLC is yet to be determined. Reduced opacity of RHOA/B/C, 
ROCK and (p)MLLC indicates reduced signalling of this pathway. Right, 
loss of endothelial GS renders RHOJ less active (visually reflected by 
fewer palmitoylated, membrane-bound RHO] proteins), and reduces the 
inhibition of the RHOA/B/C-ROCK-(p)MLC pathway. The resulting 
excessive stress-fibre formation causes ECs to lose migratory capacity and 
reduces vessel branching in vivo. Dashed lines indicate reduced activity; 
the red cross indicates GS blockade; the question mark indicates unknown 
mechanisms. Exact P values are as follows: 0.0153 (b); 0.0334 (e); 2 versus 
0M: 0.6327; 5 versus 01M: 0.2841; 10 versus 01M: 0.1090; 20 versus 
0M: 0.0339; 40 versus 0}1M: 0.0034 (h); 1 versus 0.5 1g: 0.5806; 2 versus 
0.5 jug: 0.0319; 4 versus 0.5 1g: 0.0037; 8 versus 0.5 1g: 0.0001; 16 versus 

0.5 wg: 0.0001 (i); 0.0313 (m); RHOJ C3A versus RHOJ WT: 0.0001; RHOJ 
C11A versus RHOJ WT: 0.0001 (q); RHOJ C3A versus RHOJ WT: 0.0015; 
RHOJ C11A versus RHOJ WT: 0.0007 (s); 0.0051 (t); 0.0461 (u). Scale bar, 
200 1m (k, 1, n-p). For gel source images, see Supplementary Fig. 1. 
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Extended Data Fig. 8 | See next page for caption. 
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Extended Data Fig. 8 | Possible molecular model of GS 
autopalmitoylation. a, Structure of human GS, showing its bifunnel- 
shaped catalytic site. A schematic of the GS decamer is shown from 

the top and front views with individual subunits A and B labelled and 
coloured grey and green, respectively. On the right is a close-up of the 
bifunnel catalytic site that is formed between subunits A and B. The GS 
decamer has ten active sites, each located at the interface of two adjacent 
subunits. ATP enters from the top, whereas glutamate enters from below; 
manganese ions (Mn?") are shown as grey spheres. b, Molecular dynamics 
simulation of palmitoyl-CoA in the catalytic cleft of GS predicts that, 
whereas the head of palmitoyl-CoA is tightly bound to the adenine- 
binding site, the tail can point in opposing directions with respect to 

the principal axis of the protein. The most representative structures of 

the two alternative conformations (A and B) observed during the long 
molecular dynamics simulations for palmitoyl-CoA binding to GS (in 
blue, seen from two different perspectives) are shown in red (A, tail 
bending upwards) and green (B, tail bending downwards). c, Detailed 
view of conformation A, which is the main conformation. The sulfur 
atom of palmitoyl-CoA (which is immediately adjacent to the carbon on 
which the nucleophilic attack occurs) (coloured yellow) approaches the 
highly conserved cysteine 209 (also coloured yellow), with an interatomic 
distance (S-S) that, during the simulations, reversibly fluctuates between 3 
and 8 A. The hydrophobic tail positions itself along grooves characterized 
by the presence of hydrophobic residues. Colour coding is as follows: 
carbon, grey; nitrogen, blue; phosphorous, gold; oxygen, red. Cysteines 
and serines within 5 A of the palmitoy] tail are highlighted in yellow and 
orange, respectively. The hydrophobic residues around the tail are shown 
in green. d, Detailed view of conformation B, in which the tail is found in 
a buried hydrophobic cleft, with the sulfur at a distance of 5 A or less from 
the conserved serines 65 and 75 and the tail occupying the site of the GS 
inhibitor MSO. Details of the extensive steric clash between MSO and the 
secondary binding pose (B) observed in palmitoyl-CoA MD simulations 
are shown. Palmitoyl-CoA is represented as sticks, with standard atomic 
colours as stated in c. MSO is shown in cyan and its position is taken from 
entry 2QC8 of the Protein Data Bank. Cysteines and serines within 5 A 

of the palmitoy] tail are highlighted in yellow and orange, respectively. 
The hydrophobic residues around the tail are shown in green. e, GS 
immunoblotting after streptavidin pull-down of biotin-azide-clicked 
lysates from 16C-YA (palmitoylation probe) labelled HEK-293T cells 
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overexpressing wild-type GLUL or GLUL with a point mutation in C209. 
The input shows the level of GS overexpression. A representative blot 
from 4 independent experiments is shown. f, g, Quantification of total 
sprout length (f) and number of sprouts per spheroid (g) for control and 
GLUL*® ECs with or without overexpression of shRNA-resistant GLUL 
encoding the point mutation C209A (rGLUL@4-°£) (mean +s.e.m.; n=4 
independent experiments; *P <0.05 versus control, one-way ANOVA with 
Dunnett's multiple comparison versus control). h, Schematic of protein 
autopalmitoylation. Upon binding of palmitoyl-CoA to the protein, 

free CoA (grey oval) is released and can be detected. i, Recombinant 
wild-type GS and GS with point mutations R324C and R341C were 
incubated with different concentrations of palmitoyl-CoA in a cell-free 
system at physiological pH. The amount of CoA released per minute 

was determined as a direct readout for protein autopalmitoylation. Data 
are mean +s.e.m. of 3 (R324C and R341C) and 4 (WT) independent 
experiments. NS, P > 0.05; *P <0.05 according to two-way ANOVA 
comparing the entire dose-response to the dose-response of wild-type 
GS. j, Different amounts of recombinant wild-type, R324C and R341C 

GS were incubated with a fixed concentration of palmitoyl-CoA (401M), 
and the amount of CoA released per minute was determined as readout 
for autopalmitoylation. Data are mean + s.e.m. of 4 (R324C and R341C) 
and 5 (wild-type) independent experiments. NS, P > 0.05; *P <0.05 
according to two-way ANOVA comparing the entire dose-response to the 
dose-response of wild-type GS. The data for wild-type GS from i and j are 
also included in Extended Data Fig. 7 as stand-alone data, but are included 
here for comparison purposes. k, Boyden chamber migration for control, 
GLUL®?, GLULK? + rGLUL™, GLUL? + rGLUL1@O and GLULK? 

+ rGLUL*4C-OF ECs, all under mitomycin C-treatment (mean + s.e.m.; 
n= 3 independent experiments; NS, P > 0.05; *P <0.05, one-way ANOVA 
with Dunnett’s multiple comparison versus control). Exact P values: 
GLUL*®? versus control: 0.0004; GLUL®? + rGLUL©0"°F versus control: 
0.0004 (f); GLUL*? versus control: 0.0001; GLULK? + rGLUL©20%-0£ 
versus control: 0.0001 (g); R324C versus WT: 0.8228; R341C versus 

WT: 0.7530 (i); R324C versus WT: 0.1331; R341C versus WT: 0.0003 

(j); GLUL®® versus control: 0.0054; GLULK + rGLUL® versus control: 
0.8152; GLUL®? + rGLULPUC- versus control: 0.3645; GLULX? + 
rGLUL*?*4C-F versus control: 0.2118 (k). For gel source images, see 
Supplementary Fig. 1. 
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Extended Data Table 1 | Weight, haematological and blood plasma parameters for six-week-old Glu/£°*° mice and control littermates 


Ctrl vECKO 
Hematological parameter 
WBCs (x10 yl’) 4.42 + 0.45 4.63 + 0.49 
neutrophils (x10? ut") 0.56 + 0.13 0.40 + 0.06 
lymphocytes (x10° uf) 3.65 + 0.41 4.05 + 0.44 
monocytes (x10? ul’) 0.08 + 0.02 0.03 + 0.01* 
eosinophils (x10° ut") 0.05 + 0.01 0.06 + 0.01 
basophils (x10° ul’) 0.08 + 0.02 0.09 + 0.02 
platelets (x10°') 766 + 37 720 + 72 
RBCs (x10° ut’) 6.89 +0.15 6.82 +0.14 
hemoglobin (g di’) 10.63 + 0.21 10.66 + 0.23 
hematocrit (%) 54.99 + 1.17 55.00 + 1.28 
Liver parameter 
plasma AST (U I’) 49.00 + 4.00 49.50 + 4.50 
plasma ALT (U I’) 26.50 + 0.50 27.50 + 4.50 
plasma y-GT (U I’) = 3 <3 
plasma bilirubin (mg df?) < 0.18 < 0.18 
Inflammation parameter 
CRP < 0.3 < 0.3 
Weight 

males (g) 18.80 + 0.57 19.21 + 1.22 
17.61 + 0.80 17.77 + 0.50 


females (g) 


Values are mean +s.e.m. of n= 14 (control; Ctrl) versus n=17 (GlulvECK°) mice. *P=0.0232 versus control, Student’s t-test. WBCs, white blood cells; RBCs, red blood cells; AST, aspartate amino trans- 


ferase; ALT, alanine amino transferase; +-GT, gamma-glutamyl transferase; CRP, C-reactive protein. 
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Extended Data Table 2 | Alignment of the amino acid sequence encompassing the C209 residue across different species 


Amino Acid Sequence 


Species Name Identity 
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Multiple sequence alignment showing the conservation of amino acid C209 (in red) in GS across different species. Here the sequence alignment of 41 residues surrounding this cysteine is shown for 


up to 100 of the closest sequence identity matches with the GS of H. sapiens obtained with BLAST from the UniProtKB/Swiss-Prot database. C209 is mostly conserved across species, and when not 
conserved it is often substituted by residues (Ser or Thr) that can (in theory) be palmitoylated as well. In E. coli (shown at the bottom), for example, a Thr is found at the structurally equivalent position 


(T210, highlighted in yellow). If several GS isoforms are known for the same species, only the one with the highest percentage identity to that of H. sapiens is shown. 
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Malaria parasite translocon structure 
and mechanism of effector export 


Chi-Min Ho!*°, Josh R. Beck**°, Mason Lai*, Yanxiang Cui’, Daniel E. Goldberg*, Pascal F. Egea!** & Z. Hong Zhoub?:7* 


The putative Plasmodium translocon of exported proteins (PTEX) is essential for transport of malarial effector proteins 
across a parasite-encasing vacuolar membrane into host erythrocytes, but the mechanism of this process remains 
unknown. Here we show that PTEX is a bona fide translocon by determining structures of the PTEX core complex at 
near-atomic resolution using cryo-electron microscopy. We isolated the endogenous PTEX core complex containing 
EXP2, PTEX150 and HSP101 from Plasmodium falciparum in the ‘engaged’ and ‘resetting’ states of endogenous 
cargo translocation using epitope tags inserted using the CRISPR-Cas9 system. In the structures, EXP2 and PTEX150 
interdigitate to form a static, funnel-shaped pseudo-seven-fold-symmetric protein-conducting channel spanning the 
vacuolar membrane. The spiral-shaped AAA+ HSP101 hexamer is tethered above this funnel, and undergoes pronounced 
compaction that allows three of six tyrosine-bearing pore loops lining the HSP101 channel to dissociate from the cargo, 
resetting the translocon for the next threading cycle. Our work reveals the mechanism of P. falciparum effector export, 
and will inform structure-based design of drugs targeting this unique translocon. 


Malaria has devastated major civilizations since the dawn of humanity, 
and remains a considerable burden to society; it is responsible for more 
than 200 million cases and nearly half million deaths each year’. This 
infectious disease is caused by Plasmodium parasites, which invade and 
reproduce within human erythrocytes, inducing the clinical symptoms 
of malaria”. These parasites export hundreds of effector proteins that 
extensively remodel host erythrocytes, which have limited capacity 
for biosynthesis**. Collectively known as the exportome, these pro- 
teins create the infrastructure necessary to import nutrients, export 
waste, and evade splenic clearance of infected erythrocytes’. Most of 
these proteins bear a five-residue motif called the Plasmodium export 
element (PEXEL)*””. The malaria parasite conceals itself inside a para- 
sitophorous vacuole, which is derived from invagination of the host cell 
plasma membrane during invasion!! (Fig. 1a). Following secretion into 
the parasitophorous vacuole, proteins destined for export are unfolded 
and transported across the parasitophorous vacuole membrane (PVM) 
into the host cell in an ATP-dependent process!*!°. To accomplish 
this, it has been hypothesized that the parasite has evolved a unique 
membrane protein complex, the Plasmodium translocon of exported 
proteins (PTEX)'*. PTEX is the only known point of entry to the host 
cell for exported proteins and is therefore an attractive drug target, as 
disrupting PTEX blocks delivery of key virulence determinants and 
induces parasite death'>!°. 

PTEX has been proposed to be a membrane protein complex larger 
than 1.2 MDa, with a core composed of the HSP101 ATPase and two 
novel proteins, PTEX150 and EXP2'*!7 (Fig. 1a). HSP101 belongs to 
the class 1 Clp/HSP100 family of AAA+ ATPases, PTEX150 has no 
known homologues beyond the Plasmodium genus, and EXP2 isa PVM 
protein!*® that is conserved among vacuole-dwelling apicomplexans”’. 
All three core components are essential for protein export and para- 
site survival'>!°?°. A model of PTEX-mediated translocation has been 
proposed, in which HSP101 unfolds and threads proteins through an 
oligomeric EXP2 transmembrane channel spanning the PVM, with 
PTEX150 having a structural role between EXP2 and HSP101'*"!”, 


However, without structural information, the global architecture of 
PTEX, the stoichiometry of its components and its molecular mecha- 
nism have remained unclear. 

In this study, we purify PTEX directly from the human malaria para- 
site P falciparum and determine the structure of the complex in multiple 
functional states at near-atomic resolution using cryo-electron 
microscopy (cryo-EM). Our atomic models reveal the architecture 
and mechanism of this unique translocon and open a path towards 
development of novel therapeutics against this promising anti-malarial 
drug target. 


Architecture of the PTEX core complex 

To purify PTEX from P. falciparum, we used CRISPR-Cas9 edit- 
ing to introduce a 3 x Flag epitope tag on the C terminus of endog- 
enously expressed HSP101 (Extended Data Fig. la—c) and purified 
the endogenously assembled PTEX core complex directly from 
P. falciparum cultured in human erythrocytes (Extended Data 
Fig. 1d-f). Cryo-EM analysis yielded two distinct conformations of 
PTEX particles, one extended (195 A) and the other compact (175 A) 
(Fig. 1b, c, Extended Data Table 1). Endogenous cargo polypeptide 
densities are visible in the central pore of HSP101 in both structures 
(Fig. 1b, c, Extended Data Figs. 2, 3). On the basis of differences 
in the arrangement of HSP101 subunits relative to the cargo between 
the two conformations, we designated them as the ‘engaged’ and 
‘resetting’ states, respectively. Both maps are at near-atomic resolu- 
tion, varying from 3.0-3.6 A in the transmembrane and core regions 
to 5-8 A in the periphery (Fig. 1b, c, Extended Data Fig. 4). As 
most regions exhibit clear sidechain densities throughout both 
maps (Fig. 1b, c, Extended Data Fig. 5, Supplementary Videos 1, 2) 
we were able to build de novo atomic models of the three constitu- 
ent proteins for both conformational states (Fig. 1d, e). Each model 
contains 20 subunits with 6,898 modelled amino acid residues. 
Models of all subunits were built independently, as their conforma- 
tions varied. 
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Fig. 1 | Global architecture of the PTEX core complex in two cargo-bound states. a, Schematic of a human erythrocyte infected with P. falciparum. 
PPM, parasite plasma membrane; PVM, parasitophorous vacuole membrane. b-e, Cryo-EM maps (b, c) and atomic models (d, e) of the PTEX core 
complex. Horizontal lines represent the PVM bilayer, estimated on the basis of the detergent belt density, which is visible at lower thresholds (see 
Extended Data Fig. 7). PV, parasitophorous vacuole; RBC, red blood cell. f-k, Top and side views of the HSP101 (f, g), PTEX150 (h, i) and EXP2 (j, k) 
cryo-EM maps, coloured by protomer. I-n, Pore radius (1) and protein-conducting channel (m, n) calculated using HOLE”. 


Both structures show that PTEX is a tripartite membrane protein 
complex with a 6:7:7 stoichiometry and a calculated mass of 1.6 MDa, 
composed of a hexameric HSP101 protein-unfolding motor tethered to 
a PVM-spanning, pseudosymmetric funnel formed by seven protomers 
of EXP2 interdigitating with seven protomers of PTEX150 (Fig. 1d-k, 
Supplementary Video 3). Two transiently associated”! accessory pro- 
teins, PTEX88 and TRX2"4, are not seen in our structures. At the PVM, 
each EXP2 monomer contributes a single transmembrane helix to form 
a seven-fold (C7)-symmetric protein-conducting channel spanning 
the membrane (Fig. 1j, k). Six HSP101 protomers are tethered on top 
of the PTEX150—EXP2 funnel in a hexameric right-handed spiral, 
with a gap between the bottom-most and top-most protomers (Fig. 1f, 
g, Supplementary Video 4). The HSP101 hexamer is oriented such 
that a single unbroken channel extends from the top of the HSP101 
hexamer to the bottom of the heptameric EXP2 transmembrane pore 
(Fig. 1l-n, Extended Data Fig. 2d). The most constricted point along 
the channel occurs in HSP101, measuring 4 A and 10 A in diameter 
in the engaged and resetting states, respectively (Fig. 11). The seventh 
EXP2 and PTEX150 protomers are situated under the gap between 
HSP101 protomers 1 and 6, accommodating the remarkable symmetry 
mismatch between the asymmetric HSP101 hexamer and the 
pseudo-seven-fold-symmetric PTEX150-EXP2 tetradecamer (Fig. 1f-k, 
Extended Data Fig. 2e-j). Analyses of our PTEX150 and EXP2 structures 
with four commonly used structural similarity search programs?*-*> 
found no consistent structural similarities to any known proteins, 
including the pore-forming toxin haemolysin E, with which EXP2 
was previously speculated to share structural homology'*. Below, we 
describe the structural details of the individual proteins in the engaged 
state, followed by a comparison of the two states that suggests a mech- 
anism of translocation. 


EXP2 forms a protein- conducting channel across PVM 
Residues G27-S234 of EXP2 are well-resolved in our structure, 
accounting for 80% of the mature protein (Extended Data Fig. 6a). 


EXP2 is a single-pass transmembrane protein consisting of a kinked 
60 A N-terminal transmembrane helix followed by a globular body 
domain and ending in an assembly domain composed ofa linker helix 
followed by the assembly strand (Fig. 2a, b). The body domain contains 
five helices, B1—B5, which are stabilized by an intraprotomer C113— 
C140 disulphide bond (Fig. 2c). 

Seven EXP2 protomers (which we labelled A—G) oligomerize to form 
a funnel-shaped C7-pseudosymmetric 216-kDa heptamer spanning the 
PVM (Fig. 2d, e). The transmembrane domain and body helices B1-B3 
are symmetric throughout all seven protomers (Extended Data Fig. 3a, b). 
This symmetry is broken by inter-protomer conformational variations 
in body helices B4—B5 and the assembly domain, which stretch upwards 
in some protomers to maintain contacts with the asymmetric HSP101 
hexamer located above the EXP2 funnel. This variation is most pro- 
nounced in EXP2 protomers F and G (Extended Data Fig. 3a, b). 

In the EXP2 heptamer, the amphipathic transmembrane helices twist 
slightly around each other, creating a 37 A-long C7-symmetric pro- 
tein-conducting channel that spans the PVM and forms the stem of the 
funnel (Fig. 2d, e). The membrane-facing surface of the EXP2 channel 
is coated with hydrophobic residues, whereas the inner surface is lined 
with charged and polar residues, creating an aqueous pore (Fig. 2e). 
The body domains, positioned in a wider ring on top of the trans- 
membrane channel on the vacuolar face of the PVM, form the mouth 
of the funnel. This orientation is consistent with previous analyses of 
EXP2 topology'*”°. Furthermore, a detergent belt is clearly visible in 2D 
class averages and density maps (Extended Data Figs. 7, 8), defining the 
residues in the transmembrane domain that would be buried in the PVM. 
A ring of positively charged residues where the stem meets the mouth of 
the funnel is positioned where it can interact with the negatively charged 
phosphates of the membrane surface (Extended Data Fig. 8a). 


PTEX150 forms an adaptor between HSP101 and EXP2 
Of the 993 residues in PTEX150, S668—D823 are well-resolved in 
the structure, and form a hook with a shaft (Fig. 3a, b). The hook 
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Fig. 2 | EXP2 forms a heptameric pseudo-symmetric PVM-spanning 
pore. a, b, Linear schematic (a) and ribbon diagram (b) of the EXP2 
monomer in the engaged state. Dashed grey boxes represent unmodelled 
regions. Inset, one EXP2 monomer (coloured) within the PTEX complex. 
TMD, transmembrane domain. c, Density (mesh) and model of the C113- 
C140 disulphide bond. d, EXP2 heptamer, coloured as in b. e, Cutaway of 
the EXP2 transmembrane channel with hydrophilic residues (pink) lining 
the inner protein-conducting pore and hydrophobic residues (yellow) on 
the outer, membrane-facing surface. 


domain consists of three short helices (H1—H3), which are connected 
by several long loops. Directly N-terminal and C-terminal to the 
hook domain, the shaft is composed of proximal and distal shaft 
domains (Fig. 3a, b). The remaining 80% of PTEX150—not visible 
in our structures—is predicted to be intrinsically disordered (in this 
region the average disorder tendency score in IUPred”*”’ is 0.83, with 
scores above 0.5 indicating disorder), unlike the rigid structured core 
of PTEX150(S668-D823) (with an average disorder tendency score of 
0.42, indicating ordered structure), suggesting that this 80% of the pro- 
tein is too mobile to be observed and may be flexibly arranged outside 
the stable PTEX core. 

Seven PTEX150(S668-D823) hooks (which we labelled a-g) 
oligomerize, forming a flange-shaped C7-pseudosymmetric heptamer 
(Fig. 3c) that fits into the mouth of the EXP2 channel. Each hook 
lies in the groove between adjacent EXP2 body domains, and the 
tip of the hook curls down into the mouth of the EXP2 pore 
(Fig. 3d). A vertical, heptameric ring of H2 helices sits in the mouth 
of the EXP2 funnel, forming a conduit between the hexameric 
HSP101 and heptameric EXP2 central pores (Extended Data Fig. 2g-j). 
In this way, PTEX150(S668-D823) serves as an adaptor between 
HSP101 and EXP2, providing a continuous protected path for unfolded 
cargo. 


Endogenous cargo is observed in the channel of HSP101 
Class 1 Clp-HSP100 AAA+ ATPases are highly conserved hexameric 
protein unfoldases that are associated with diverse functions; they 
are known to thread polymeric substrates through a central pore”®”’. 
HSP101 is a 598-kDa hexamer, exemplifying the canonical class 1 Clp- 
HSP100 domain architecture*°!, with a substrate-binding N-terminal 
domain* followed by two AAA+ nucleotide-binding domains (NBD1 
and NBD2), each containing a cargo-binding pore loop (L1 and L2, 
respectively) that extends into the central pore (Fig. 4a, b). Additionally, 
HSP101 contains a C-terminal domain and a coiled-coil middle domain 
insertion in the C-terminal end of NBD1 (Fig. 4a, b). 

Unlike class 2 HSP100s *9, class 1 HSP100s form three-tiered hex- 
amers, in which the N-terminal domains, NBD1s and NBD2s form 
the top, middle and bottom tiers, respectively*”*!. In our engaged state 
structure, the NBD1 and NBD? tiers are arranged in a right-handed 
ascending spiral?”?!*# (Fig. 4c). A layer of weaker density above the 
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Fig. 3 | PTEX150 forms a heptameric flange-shaped adaptor between 
EXP2 and HSP101. a, b, Linear schematic (a) and ribbon diagram (b) of 
the PTEX150(668-823) monomer in the engaged state. Dashed grey boxes 
represent unmodelled regions. Inset, one PTEX150(668-823) monomer 
(coloured) within the PTEX complex. PS, proximal shaft; DS, distal shaft. 
c, The PTEX150(668-823) heptamer, coloured as in b. d, Views showing 
how one PTEX150(668-823) monomer hooks into the top of the EXP2 
funnel. 


NBD1 tier may correspond to the N-terminal domains, which are likely 
to be dynamic (Extended Data Fig. 8b). The middle domains encir- 
cle the upper NBD1 tier. The central pore of the spiral is lined with 
pore loops bearing tyrosines in a spiral staircase pattern. The tyrosine 
sidechain densities intercalate with a 45 A-long density that is clearly 
visible in the middle of the chaperone pore (Fig. 4c, d, Supplementary 
Video 5), which closely resembles unfolded cargo polypeptide densi- 
ties reported in recently published cryo-EM structures of homologous 
HSP 100s bound to cargo*!? (Extended Data Fig. 2a, d). The unfolded 
PTEX cargo polypeptide chain modelled into this 45 A density matches 
very closely (root mean square deviation (r.m.s.d.) of 1.09-1.25 A) with 
the unfolded cargo polypeptides in these cargo-bound homologue 
structures (Extended Data Fig. 2a-c). 


Key interactions for PTEX assembly and function 

The three PTEX components share extensive binding interfaces; 
here we describe the two most noteworthy interactions. In each EXP2 
protomer A-F, the assembly strand augments the C-terminal domain 
8-sheet in the HSP101 protomer situated directly above it (Fig. 5a, b). 
Protomer G occupies the space below the gap between HSP101 pro- 
tomers 1 and 6 (Fig. 1f-k). This hydrogen bond interaction tethers 
the HSP101 hexamer to the transmembrane funnel, positioning the 
central pore exit directly above the entrance to the PTEX150-EXP2 
pore. We hypothesized that this interaction is essential for assembly of 
the PTEX core complex, and that the complex must be stably assembled 
to be active. We tested this using genetic functional complementation 
in live P falciparum. 
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Fig. 4 | Endogenous cargo bound in the channel of the HSP101 
hexamer. a, b, Linear schematic (a) and ribbon diagram (b) of the HSP101 
monomer in the engaged state. MD, middle domain; CTD, C-terminal 
domain. SSD, small subdomain. c, Side view of the full (left) and bisected 
(right) cryo-EM map of the HSP101 hexamer. NBD1 and NBD2 rings are 
coloured with light (NBD1) and dark (NBD2) blue gradients to emphasize 
the right-handed spiral shape of the hexamer. The bisected map shows 
NBD2 pore loop densities coloured by protomer, ATPS (magenta) 

and the cargo density (light pink). d, Enlarged side view of the atomic 
models of the HSP101 NBD2 pore loops and unfolded cargo polypeptide 
backbone, shown with densities. NBD2 pore loops are coloured as in c and 
labelled by protomer (for example, D2PL,P1 is NBD2 pore loop, protomer 
1). Vertical distances between pore loop tyrosines in D2PL,P1-D2PL,P6 
are 6.52 A, 6.28 A, 6.38 A, 6.96 A and 6.12 A, respectively. 


Knockdown of EXP2 produces a lethal defect in P._ falciparum growth 
and export that can be rescued by a mutant version of EXP2 that lacks 
the C-terminal 54 residues’. Therefore, the amino acids immedi- 
ately following the assembly strand are not essential for PTEX func- 
tion. However, complementation with a version of EXP2 that lacks an 
additional 12 residues, removing the assembly strand, failed to rescue 
these phenotypes (Fig. 5c-f). These results demonstrate that the EXP2 
assembly strand is critical to PTEX function, consistent with an essen- 
tial role for it in docking the HSP101 unfoldase to the EXP2 membrane 
channel to facilitate translocation. 

A strong, albeit lower resolution claw-shaped density extends from 
the end of each modelled PTEX150(S668-D832) shaft to the HSP101 
middle domain above, terminating in a three-turn helix that rests on the 
midpoint of the middle domain. This helix forms a strong interaction 
with Y488 and Y491 of HSP101 in claws ae (Extended Data Fig. 8d- 
e), but is not visible in claw fin the engaged state. Claw g appears to 
form an additional interaction with the N-terminal end of the middle 
domain of HSP101 protomer 1 (Extended Data Fig. 8d). The middle 
domain is known to have a critical role in regulating ATPase and unfol- 
dase activities in related HSP100s*>"*, suggesting that this interaction 
is of high functional importance. 


Two observed states suggest a translocation mechanism 

In addition to the extended 195 A engaged state of PTEX, we 
also observed a more compact 175 A resetting state. Much of 
PTEX150(S668-D823) and EXP2 remain unchanged between the 
engaged and resetting states; a hinge-like swinging motion in the 
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Fig. 5 | Interactions essential for PTEX function. a, b, Ribbon (a) and 
stick (b) models of the HSP101 C-terminal domain 8-sheet augmented 
by the EXP2 assembly strand, shown with corresponding cryo-EM 
density (mesh). The segment outlined in red was truncated in functional 
complementation assays. c, Western blot of lysates from P. falciparum 
expressing EXP2 under aptamer control (EXP2*") complemented with 
EXP2(A222-287)-3 x Myc (predicted molecular weight 27.8 kDa after 
signal peptide cleavage). For blot source data, see Supplementary 

Fig. 1. d, Growth analysis of EXP2°?'::EXP2(A222-287)-3 x Myc. 

P falciparum were grown with or without anhydrotetracycline (aTc) 

to maintain or knockdown endogenous EXP2 expression, respectively. 
One experiment performed with three technical replicates is shown. 
Bars show mean exponential growth rate constant (h~') determined 
from the fit of the two independent experiments and error bars indicate 
s.d. e, Immunofluorescence assay detecting exported protein SBP1 

and HSP101-3 x FLAG (a marker of the parasitophorous vacuole) 

in P. falciparum expressing EXP2*?'::EXP2(A222-287)-3 x Myc that 
were allowed to develop with or without aTc for 24 h after invasion. 
The dashed line indicates the traced boundary of the red blood cell. 
DIC, differential interference contrast image. f, Quantification of 

SBP1 export immunofluorescence assays. Data are pooled from two 
independent experiments; n is the number of individual parasite-infected 
RBCs. Boxes and whiskers delineate 25th to 75th and 10th to 90th 
percentiles, respectively. All P values are determined by an unpaired, 
two-sided Student's t-test. All data shown represent two independent 
experiments. 


HSP101 hexamer accounts for the 20 A difference in height. The 
transmembrane domain and B1-B3 helices of EXP2 exhibit C7 sym- 
metry, and remain identical between the two states (Supplementary 
Video 6). The deviations from C7 symmetry in the B4-B5 helices and 
assembly domain are less pronounced in the resetting state (Extended 
Data Fig. 3b, c), probably owing to the more planar arrangement of 
HSP101 protomers. As in the engaged state, slight inter-protomer 
variations in the H2-H3 region of PTEX150(S668-D823) bridge the 
gap between EXP2 and HSP101, maintaining a continuous protected 
path for unfolded cargo proteins. 

The spiral staircase of HSP101 tyrosine pore loops in the 
engaged state collapses into a planar C shape in the resetting state 
(Supplementary Video 7), with a freedom of movement possibly 
conferred by the gap between HSP101 protomers 1 and 6°"!. Starting 
at the interface between the NBD2 domains of HSP101 protomers 
3 and 4, HSP101 protomers 4-6 swing downwards and outwards, 
creating a deep vertical cleft through the central pore of the hexamer. 
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Fig. 6 | Mechanism of translocation. a, Side views of the HSP101 pore 
loops with the unfolded cargo peptide backbone models (dark pink) built 
into the cryo-EM densities (light pink). Vertical distances between pore 
loop tyrosines in consecutive loops are as follows. Engaged D1PL,P1- 
DIPL,P6: 9.41 A, 8.61 A, 1.40 A, 3.34 A and 2.28 A, respectively. Engaged 
D2PL,P1-D2PL,P6: 6.52 A, 6.28 A, 6.38 A, 6.96 A and 6.12A, respectively. 
Resetting DIPL,P1-D1PL,Pé6: 1.75 A, —2.70 A, —1.65 A, —0.78 A and 
1.81 A, respectively. Resetting D2PL,P1-D2PL,P6: 5.88 A, 4.56 A, —6.80 A, 
2.25 A and 7.88 A, respectively. b, Proposed stepwise feeding mechanism 
of translocation by PTEX. NBD1 and NBD2 pore loops and cargo are 
coloured as in a. 


This movement pulls the NBD2 loops in protomers 4-6 away from 
the unfolded cargo (Extended Data Fig. 3d, e, h, i, Supplementary 
Videos 6, 7). A shorter (19 A versus 45 A), unfolded cargo density 
remains visible, bound to the NBD2 loops in protomers 1-3, whereas 
no peptide density is visible in protomers 4-6 (Fig. 6a, Extended Data 
Fig. 3h, i). Furthermore, the NBD1 domain of protomer 3 rotates out- 
ward, such that the R361 arginine finger remains within 5.2 A of the 
ATP*S bound to NBD1 of protomer 4, and the nucleotide in NBD2 of 
protomer 4 shifts 7.5 A away from the R859 arginine finger in protomer 
3 (Extended Data Fig. 3f, g). 


Discussion 

We propose a PTEX-mediated mechanism of protein translocation 
via a cyclic process involving at least two discrete states (Fig. 6b, 
Supplementary Videos 6, 7), which we have captured by purifying 
PTEX complexes directly from P._ falciparum that are actively translocat- 
ing cargo. The pore loops in HSP101 NBD2 form two ‘hands’ that work 
together to thread the cargo protein through the central pore. NBD2 
loops from HSP101 protomers 1-3 form the passive hand, located clos- 
est to the PTEX150(S668-D823)—EXP2 funnel, which remains fixed 
between states (Fig. 6a, b). NBD2 loops from HSP101 protomers 4-6 
form the active hand, which moves along the channel axis (above the 
passive hand), grasping the unfolding peptide and feeding it through 
the passive hand. In the engaged state, all six NBD2 pore loops grip 
the unfolded peptide in the spiral staircase formation (Fig. 6a, b). As 
the HSP101 hexamer collapses into the resetting state, the active hand 
moves downwards, feeding the newly unfolded peptide through the 
passive hand, into the PTEX150(S668-D823)-EXP2 funnel below. The 
passive hand then grips the unfolded peptide, preventing it from slip- 
ping back towards the HSP101 apical entrance while the active hand 
swings outward, releasing the cargo (Fig. 6a, b). Finally, the active 
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hand moves upwards to grasp the unfolding protein further upstream, 
transitioning back to the engaged state. This cyclic feeding mechanism 
threads the unfolded cargo protein through the translocon, across the 
PVM and into the cytosol of the host cell. 

The states captured here may be two of several states in the processive 
phase of translocation. Additional states are likely to exist for cargo rec- 
ognition. Although we did not observe PTEX-free HSP101 oligomers as 
has been suggested>’, we did observe additional, seemingly cargo-free 
PTEX complexes (Extended Data Fig. 9) that did not refine to better 
than 7 A, suggesting that there is conformational heterogeneity in the 
absence of stabilizing cargo interactions. Cargo—-PTEX interactions 
during cargo recognition may be transient, possibly explaining why we 
did not observe the N-terminal domains of HSP101, or other compo- 
nents that are potentially required for cargo recognition. Without these 
details, the mechanisms of cargo recognition and subsequent refolding 
after translocation remain unclear, although there is some evidence for 
the involvement of exported parasite chaperones*® or co-opted host 
chaperonins” in these processes. Of note, on the basis of secondary 
structure prediction and PTEX150 truncation experiments*”, PTEX150 
residues D838-F912 may occupy the claw (PTEX150 residues D838- 
E873) and three-turn helix (PTEX150 residues S884—F912) densities 
that remain unassigned in our structures. 

Our work demonstrates the advantages of obtaining structures 
of challenging protein complexes in functionally relevant states by 
imaging samples purified directly from endogenous sources. Direct 
observation of the native PTEX core provides compelling evidence that 
this complex containing EXP2, PTEX150 and HSP101 is a bona fide 
translocon that is embedded in the PVM and serves as the gateway for 
the malaria parasite exportome. In addition to establishing the role of 
EXP2 as the membrane-spanning pore of PTEX and providing insight 
into the mechanism of this essential protein translocating machine, our 
structures reveal an interaction between the EXP2 assembly domain 
and the C-terminal domain of HSP101 that is indispensable for PTEX 
function. These atomic structures of PTEX provide potential targets 
for designing a new class of drugs to inhibit this essential gatekeeper 
of the malarial exportome. 


Online content 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized. The investigators were not blinded to allocation during 
experiments and outcome assessment. 

Cells. P. falciparum strain NF54*"* was obtained from the Fidock laboratory 
where it was generated“! and was used exclusively in the study. De-identified, 
IRB-exempt expired RBCs were obtained from the blood bank at the St. Louis 
Children’s Hospital. PCR amplified regions from the NF54*"® genome were found 
to match the genome sequence for 3D7, a sub clone of NF54. The presence of the 
cg6 localized attB sequence was verified by successful Bxb1-mediated integration 
at that site. Cell lines were not tested for mycoplasma contamination. 

P. falciparum culture and genetic modification for PTEX purification. 
P. falciparum culture was performed as described* with the exception 
that RPMI was supplemented with 0.5% Albumax I. All plasmid construc- 
tion was carried out by infusion cloning (Clontech) unless otherwise noted. 
Integration of a 3x Flag fusion at the endogenous HSP101 C terminus was 
accomplished with CRISPR-Cas9 editing. A Cas9 target site was chosen 
just upstream of the hsp101 stop codon (TAATAGTAAAGCTAAAAACT) 
and the guide RNA seed sequence was synthesized as a sense and anti-sense 
primer pair (sense shown) 5‘-TAAGTATATAATATTTAATAGTAAAGCTAA 
AAACTGTTTTAGAGCTAGAA-3’, annealed and inserted into the BtgZI site 
of the plasmid pAIO”, resulting in the plasmid pAIO-HSP101-CT-gRNAIL. 
A5/-homology flank (up to but not including the stop codon) was amplified from 
P. falciparum NF54*"® genomic DNA using primers 5/-GACGCGAGGAAA 
ATTAGCATGCATCCTTAAGGAGATTCTGGTATGCCACTTGGTTC-3’ and 
5/-CTGCACCTGGCCTAGGGGTCTTAGATAAGTTTATAACTAAGTTTTTA 
GCTTTACTATT-3’, incorporating a synonymous shield mutation in the pro- 
tospacer adjustment motif of the gRNA target site within the hsp101 coding 
sequence. A 3/-homology flank (beginning 3 bp downstream of the stop codon) 
was amplified using primers 5’-CACTATAGAACTCGAGAAT TACGCATATAT 
ATATATATATATATATATAACATGGGTTG-3’ and 5’-GAACCAAGTGGCA 
TACCAGAATCTCCTTAAGGATGCATGCTAATTTTCCTCGCGTC-3’. The 
flank amplicons were assembled in a second PCR reaction using primers 5’-CAC 
TATAGAACTCGAGAATTACGCATATATATATATATATATATATATAACATG 
GGTTG-3’ and 5‘-CTGCACCTGGCCTAGGGGTCTTAGATAAGTTTATA 
ACTAAGTTTTTAGCTTTACTATT-3’ and inserted between Xhol and AvrlII 
in pPM2GT™. The GFP tag between AvrII and Eagl in this vector was then 
replaced with sequence encoding a 3 x Flag tag using the primer 5'/-CTTAG 
TTATAAACTTATCTAAGACCCCTAGGGACTACA AGGACGACGACGACAA 
GGATTATAAAGATGATGATGATAAAGATTATAAAGATGATGATGATAAA 
TGACGGCCGCGTCGAGTTATATAATATATTTATG-3’ and a QuikChange 
Lightning Multi Site-Directed Mutagenesis kit (Agilent), resulting in the plasmid 
pPM2GT-HSP101-3 x Flag. This plasmid was linearized at the AfllI site between 
the 3’ and 5/-homology flanks and co-transfected with pAIO-HSP101- 
CT-gRNA1 into P. falciparum NF54*"® parasites*!. Selection with 10 nM 
WR99210 was applied 24 h after transfection. Once P. falciparum returned 
from selection, integration at the intended site was confirmed by PCR 
with primers 5’-CGAAAACTTTTATGGTATTAATATAACAG-3’ and 
5!-CCTTGTCGTCGTCGTCCTTG-3’ and a clonal line was isolated by limiting 
dilution. 

For PTEX purification, P falciparum expressing HSP101-3 x Flag were synchro- 

nized by serial treatment with 5% w/v p-sorbitol and then expanded while shaking 
to increase singlet invasion events and maintain synchrony. For each preparation, 
~2 x 10° erythrocytes infected with P falciparum were collected at the ring stage 
(typically ~500 ml 2% haematocrit culture at ~20% parasitaemia). Erythrocytes 
were lysed in 10x pellet volume of cold phosphate-buffered saline (PBS) contain- 
ing 0.0125% saponin (Sigma, sapogenin content > 10%) and EDTA-free protease 
inhibitor cocktail (Roche or Pierce). Released P. falciparum were washed in cold 
PBS containing EDTA-free protease inhibitor cocktail and washed cell pellets were 
frozen in liquid nitrogen and stored at —80°C. 
Affinity purification of PTEX core complex from P. falciparum pellets. Frozen 
P. falciparum pellets were resuspended in lysis buffer (25 mM HEPES pH 7.4, 
10 mM MgCh, 150 mM KCl, 10% glycerol) and homogenized using a glass Dounce 
tissue homogenizer. The membrane fraction was isolated from the homogenized 
lysate by centrifugation at 100,000g for 1 h. The membrane pellet was solubilized 
in solubilization buffer (25 mM HEPES pH 7.4, 10 mM MgCh, 150 mM KCl, 10% 
glycerol, 0.4% triton X-100) and the solubilized membranes were then applied to 
anti-Flag M2 affinity gel resin (Sigma). The resin was washed extensively in wash 
buffer (25 mM HEPES pH 7.4, 10 mM MgCh, 150 mM KCl, 10% glycerol, 0.015% 
triton X-100), after which the protein was eluted from the affinity resin with elution 
buffer (25 mM HEPES pH 7.4, 10 mM MgCh, 150 mM KCl, 2 mM ATP%S, 0.015% 
triton X-100, 500 j1g/ml Flag peptide). 

The presence and relative abundance of the three PTEX core components were 
verified by silver stained SDS-PAGE and tryptic digest liquid chromatography- 


mass spectrometry (Extended Data Fig. le, f). The extremely low yields achievable 
when purifying PTEX directly from P. falciparum prohibited the conventional 
approach of evaluating sample quality by size exclusion chromatography. Thus, 
during the iterative process of screening for optimal purification conditions, sample 
quality was assessed by negative stain (uranyl acetate) transmission electron 
microscopy in an FEI TF20 microscope equipped with a TVIPS 16 mega-pixel 
CCD camera. In brief, small datasets of ~100,000 particles were collected and 2D 
class averages were generated in RELION*** to assess the presence of sufficient 
numbers of intact PTEX particles yielding ‘good’ class averages exhibiting dis- 
tinct features. For example, C7 symmetry could be recognized in top views, and 
the characteristic Clp/HSP100 layers were visible in side views (Extended Data 
Fig. 6a-c). 

Cryo-electron microscopy. Three-microlitre aliquots of purified PTEX core 
complex were applied to glow-discharged lacey carbon grids with a supporting 
ultrathin carbon film (Ted Pella). Grids were then blotted with filter paper and 
vitrified in liquid ethane using an FEI Vitrobot Mark IV or a home-made manual 
plunger. Cryo-EM grids were screened in an FEI Tecnai TF20 transmission electron 
microscope while optimizing freezing conditions. 

Higher resolution cryo-EM images were collected on a Gatan K2-Summit direct 

electron detector in counting mode on an FEI Titan Krios at 300 kV equipped with 
a Gatan Quantum energy filter set at a 20 eV slit width. Fifty frames were recorded 
for each movie at a pixel size of 1.04 A at the specimen scale, with a 200-ms expo- 
sure time and an average dose rate of 1.2 electrons per A” per frame, resulting in a 
total dose of 60 electrons per A? per movie. The final dataset consists of a total of 
25,000 movies recorded in four separate sessions. 
Image processing and 3D reconstruction. Frames in each movie were aligned, 
gain reference-corrected and dose-weighted to generate a micrograph using 
MotionCor2*. Aligned and un-dose-weighted micrographs were also generated 
and used for contrast transfer function (CTF) estimation using CTFFIND4” and 
PTEX particle picking by hand and using Gautomatch (https://www.mrc-lmb.cam. 
ac.uk/kzhang/Gautomatch/Gautomatch_Brief_Manual.pdf). 

In total, 1,508,462 particles were extracted from 19,752 micrographs and ini- 
tially binned by a factor of 2. After two rounds of reference-free two-dimensional 
(2D) classification in RELION, 422,713 particles were selected as ‘good’ particles 
from distinct 2D class averages representing different views of the PTEX core com- 
plex. These particles were then used in a one-class ab initio reconstruction followed 
by homogeneous refinement in CryoSPARC’, yielding a 4.8 A ab initio 3D map. 

The original 422,713 ‘good’ particles were then aligned in a 3D refinement in 
RELION using the 4.8 A CryoSPARC map as an initial reference. All subsequent 
image-processing steps were performed using RELION. After this refinement, the 
particles were unbinned, their centres recalculated and used to re-extract particles 
from the original micrographs without binning. The newly extracted, unbinned 
particles were then aligned with a second 3D refinement yielding a ~4.5 A recon- 
struction. 

Anexhaustive, iterative search of classification and refinement conditions was used 
to sort out different conformations and further improve resolution (Extended Data 
Fig. 9). In brief, upon further sorting using 3D-classification without alignment, 
we identified two homogenous particle subsets corresponding to the engaged and 
resetting states (Extended Data Fig. 9). Particles in the two subsets were refined sep- 
arately, yielding full maps with overall resolutions of 4.16 A and 4.23 A, respectively. 

Focused 3D classification without alignment followed by focused refinement 
was used to further improve the resolution of mobile regions of the structure in 
both states. C7 symmetry was applied in the focused 3D classification and refine- 
ment steps of the heptameric halves, comprising EXP2 and PTEX150, yielding a 
3.4 A engaged state map and a 3.5 A resetting state map (Extended Data Figs. 3, 
9). The same procedure, except with Cl symmetry, was applied to the hexameric 
half of the engaged state, yielding a 4.09 A map (Extended Data Figs. 3, 9). This 
last step was also applied to the hexameric half of the resetting state, but did not 
yield improvements in resolution. Further efforts of focused 3D classification 
and refinement of individual HSP101 protomers, individual claws, and HSP101 
N-terminal domain densities in the two states did not ultimately yield improve- 
ments in resolution in either state. 

Model building and refinement. Map interpretation was performed with UCSF 
Chimera®’ and COOT™. P falciparum protein sequences were obtained from the 
National Center for Biotechnology Information (NCBI)*! and the PlaamoDB™ 
protein databases. PHYRE2™ secondary structure predictions were used as an aid 
for initial manual sequence registration. Models for a single monomer of HSP101, 
PTEX150, and EXP2 in the engaged state were all built de novo. This first model for 
each protein monomer was then placed into the density maps of other protomers 
to aid de novo modelling of subsequent protomers. Individual protomers in the 
complex were then manually remodelled to ensure a close fit between densities and 
models. The same process was repeated for the resetting state. Manual refinement 
targeting protein geometry alone was done primarily along the periphery and 
flexible regions of the complex (for example, the M domains of HSP101). Whereas 
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their densities and backbone traces were visible, we were unable to model the 
claw with its connected three-turn helix, nor one of the 12 M domain loops in the 
resetting state (Fig. 5g, h). The three-turn helix displayed a few bulky side chains 
interacting with the M domain of HSP101; however, the lack of backbone connec- 
tion to our atomic model of the complex and the limited visibility of smaller side 
chains in this region made sequence assignment challenging. 

Manual refinement targeting both protein geometry and fit with the density 
map was used primarily in the core regions where resolution was higher and noise 
was minimal. Rotamers were fitted manually in COOT and improved using the 
‘back-rub rotamers’ setting. The resulting models for the complexes were sub- 
jected to the phenix.real_space_refine program in PHENIX™. Following this step, 
Molprobity* reported less than ideal clash scores and map-to-model cross-corre- 
lation. To improve the geometry and fit, manual adjustments were made to protein 
geometry and density map fit, with the additional step of using Molprobity® clash 
dots and sphere-refinement in COOT. 

The complex was then broken into three portions: (1) symmetric regions of 
EXP2 and PTEX150, (2) HSP101, and (3) the full PTEX complex. These model 
segments were fed back to phenix.geometry_minimization in PHENIX and then 
to phenix.real_space_refine using simulated annealing and global minimization 
applying Emsley’s Ramachandran restraints”. Following another round of manual 
checks and improvements, all models were subjected to phenix.real_space_refine 
with default settings one last time. 

All figures and videos were prepared with UCSF Chimera, Pymol (https://www. 
pymol.org), and Resmap*. Molprobity was used to validate the stereochemistry 
of the final models. 

Genetic complementation. For expression of a complementing second copy of 
truncated EXP2, the exp2 coding sequence up to codon position 221 was amplified 
with primers 5‘-CGAATAAACACGATTTTTTCTCGAGATGAAAGTCAGTT 
ATATATTTTCCTTTTTTTTGTTATTCTTCG-3’ and 5’/-AATCAACTT 
TIGTTCGCTAGCTTTCTTTGATTCCATAGATTTCAATTTCTCTTCC-3’ and 
inserted into the plasmid pyEOE-attP-EXP2-3 x Myc”? between Xhol and Nhel, 
resulting in the plasmid pyEOE-attP-EXP2A 222-287-3 x Myc. This plasmid was 
co-transfected with pINT“! into EXP2*?'::HSP101-3 x Flag conditional knock- 
down parasites” at the mature schizont stage using a Nucleofector 2b and Basic 
Parasite Nucleofector kit 2 (Lonza). Selection with 2 1M DSM1°’ was applied 24h 
post-transfection (in addition to 2.5 j1g/ml blasticidin S and 1 1M anhydrotetra- 
cycline for maintenance of endogenous EXP2 translational control by the aptamer 
system) to facilitate integration into the attP site engineered in the benign cg6 
locus through integrase mediated attB x attP recombination. Following return 
from selection, P falciparum were cloned by limiting dilution, and expression of 
EXP2(A222-287)-3 x Myc was confirmed by western blot. 

P. falciparum growth assays. EXP2°"'::EXP2(A 222-287) P. falciparum were exten- 
sively washed to remove aTc and plated with or without 1 |1M aTc in triplicate 
at an initial parasitaemia of 1%. The medium was changed every 48 h and 1:1 
subculture was performed every other day beginning on day 4 to avoid culture 
overgrowth. Parasitaemia (percentage of total red blood cells (RBCs) infected) 
was measured every 24 h by flow cytometry on a FACSCanto (BD Biosciences) 
by nucleic acid staining of cultured RBCs with PBS containing 0.8 j1g/ml acridine 
orange. Cumulative parasitaemias were back calculated based on the subculture 
schedule and data were fitted to an exponential growth equation to determine rate 
constants using Prism (Graphpad). 

Quantification of protein export. For evaluation of protein export by immuno- 
fluorescence assay (IFA), mature schizonts were purified on a magnetic column 
and allowed to invade fresh, uninfected RBCs with shaking for 3 h before treat- 
ment with 5% w/v p-sorbitol to destroy unruptured schizonts. Pulse-invaded cells 
were plated with or without 1 \1M aTc and allowed to develop 24 h post-invasion. 
Thin smears of infected RBCs were briefly air dried and immediately fixed in 
ice-cold acetone for 2 min. After fixation, samples were blocked for 30 min in 
PBS + 3% BSA followed by incubation for 1 h with primary antibody solutions 
containing mouse anti-Flag M2 monoclonal antibody (detecting HSP101-3 x Flag 
to mark the PVM) and rabbit anti-SBP1. After washing, secondary antibody incu- 
bation was carried out for 1 h with Alexa Fluor anti-mouse 488 and anti-rabbit 
594 IgG antibodies (Life Technologies), each diluted 1:2,000. After final wash- 
ing, coverslips were mounted over each sample using Pro-long antifade Gold 
with DAPI (Life Technologies). Images were collected with an ORCA-ER CCD 
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camera (Hamamatsu) using AxioVision software on an Axio Imager.M1 microscope 
(Zeiss) with a 100x oil-immersion objective using the same exposure times for 
each image (300 ms for SBP1-594, 150 ms for Flag-488). Ten images were acquired 
for each condition using the DAPI channel for field selection to avoid bias. Images 
were then analysed using Volocity 6.3 (PerkinElmer). The border of each single- 
infected erythrocyte was traced using the DIC channel as a guide to define a region 
of interest (ROI). The PVM was marked using the ‘find objects’ measurement tool 
for the HSP101-3 x Flag—488 channel (automatic threshold setting with threshold 
offset set to —30% and minimum object size set to 0.5 jum). Individual Maurer’s 
clefts were identified using the ‘find spots’ measurement tool for the SBP 1-594 
channel (offset minimum spot intensity set to 40% and brightest spot within radius 
set to 0.5 j1m). All spots within the PVM object boundary were then removed 
using the ‘subtract’ measurement tool and the number and fluorescent intensity 
of the remaining spots in each ROI were collected. Data were pooled from two 
independent experiments and plotted with Prism. 

Antibodies. The following primary antibodies were used for IFA and western blot: 
mouse anti-Flag monoclonal antibody clone M2 (Sigma) (IFA, 1:500; western blot, 
1:500); rabbit polyclonal anti-SBP1°° (IFA, 1:500); mouse anti-cMYC monoclonal 
antibody 9E10 (ThermoFisher) (western blot, 1:300). 

Reporting summary. Further information on research design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. The atomic models have been deposited to the Protein Data 
Bank under accession numbers 6E10 and 6E11 and the cryo-EM density maps have 
been deposited in the Electron Microscopy Data Bank under accession numbers 
EMD-8951 and EMD-8952. 
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Extended Data Fig. 1 | Generation of HSP101-3 x Flag P. falciparum full-length HSP101-3 x Flag (predicted molecular weight 102.9 kDa after 
and analysis of purified PTEX. a, Schematic showing strategy for signal peptide cleavage). Data represent two independent experiments. 
endogenous tagging of P. falciparum hsp101 with 3 x Flag using CRISPR- d, Giemsa staining of parasite-infected human erythrocytes from which 
Cas9 editing. Diagnostic PCR primers and expected amplicon following PTEX was purified. Scale bar, 5 jum. For source data, see Supplementary 
successful integration are shown. sgRNA, single-guide RNA; UTR, Fig. 3. e, Silver-stained SDS-PAGE of the Flag-purified PTEX sample. 
untranslated region; CAM, calmodulin promoter; PfU6, P. falciparum U6 Identities of the bands labelled EXP2, PTEX150 and HSP101 were 
promoter; hDHEFR, human dihydrofolate reductase. b, Diagnostic PCR confirmed by tryptic digest liquid chromatography—mass spectrometry 
with genomic DNA template from NF54*"® parent or two independent (LC-MS). f, Tryptic digest LC-MS analysis of the Flag-purified PTEX 
populations of HSP101-3 x Flag P falciparum. The experiment was sample. The PTEX core components are among the five most abundant 
performed once. c, Western blot of NF54*"8 and HSP101-3 x Flag species detected in the purified sample. For gel and blot source data, see 
P. falciparum probed with mouse Flag M2 antibody (Sigma) and goat Supplementary Fig. 1. 
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Extended Data Fig. 2 | See next page for caption. 
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Extended Data Fig. 2 | Detailed views of the PTEX protein-conducting 
channel and symmetry mismatch in the engaged state. a~c, Cryo-EM 
densities and atomic models of cargo and pore loops from the near-atomic 
resolution structures of Clp/HSP100 ATPases YME1°? (a), PTEX HSP101 
(b) and HSP104*! (c). Tyrosine sidechain densities are clearly visibly 
intercalating with the cargo densities. The modelled engaged state PTEX 
cargo has a calculated r.m.s.d. of 1.09 A and 1.25 A to the published YME1 
and HSP104 cargo models, respectively. Pore loops are labelled by NBD 
and protomer (for example, D2PL,P1 is NBD2 pore loop, protomer 1). 

d, Side view of the bisected engaged state PTEX cryo-EM map. The 


protein-conducting channel, calculated using HOLE”, is shown 
superimposed over the bisected map in translucent white with a navy 
outline. The HSP101 NBD2 pore loop densities are coloured by HSP101 
protomer, and the cargo density is coloured pink. e-j, The transition from 
the asymmetric HSP101 spiral to the C7-pseudosymmetric PTEX150(668- 
823)-EXP2 heptamer is depicted using a series of cross-sections taken 
perpendicular to the central axis of the translocon, spanning the area of 
symmetry mismatch. The section of the translocon corresponding to each 
cross-sectional image is indicated with a bracket in d. 
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Extended Data Fig. 3 | Detailed comparisons of the engaged and 
resetting states. a, Side and top views of the EXP2 heptamer in the 
engaged state. Symmetric portions that remain constant between 
protomers are coloured in mint. Portions that vary between protomers 
are coloured and labelled by protomer. b, c, Superposition of the seven 
EXP2 protomers, labelled A-G, in the engaged (b) and resetting (c) states, 
coloured as in a. d, e, Top view of HSP101 NBD1 (d) and NBD2 (e) in the 
engaged and resetting states, shown in simplified surface representation. 
The hinge point at the interface between HSP101 protomers 3 and 4 is 
indicated. f, g, Ribbon diagrams of the resetting state (f) and engaged 
state (g) nucleotide binding pockets are shown for each protomer. ATP\S 
in each pocket is shown with corresponding cryo-EM density (mesh). 
The R859 arginine finger (sidechain shown in red-orange) is positioned 


approximately 3-5.5 A from the phosphorus atom in the y-phosphate 

of the ATP in the binding pocket of the neighbouring protomer in all 
protomers except R859 in protomer 3 in the resetting state (sidechain 
shown in gold), where the ATP)S bound in the protomer 4 NBD2 
nucleotide pocket has shifted approximately 7.5 A away from the protomer 
3 R859 arginine finger. h, i, Enlarged side view of the atomic models of 
the HSP101 NBD2 pore loops and unfolded cargo polypeptide backbone 
in the engaged (h) and resetting (i) states, shown with corresponding 
cryo-EM densities. Tyrosine sidechain densities are clearly visibly 
intercalating with the cargo densities. The modelled PTEX cargo has 

a calculated r.m.s.d. of 1.09 A and 1.25 A to the published YME1 and 
HSP104 cargo models, respectively. Pore loops are labelled by NBD and 
protomer (for example, D2PL,P1 is NBD2 pore loop, protomer 1). 
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Extended Data Fig. 6 | Experimentally determined secondary structure 
elements and detected mass-spectrometry fragments mapped to the 
primary sequences of the three PTEX proteins. For EXP2 (a), PTEX150 
(b) and HSP101 (c), secondary structure elements are shown as tubes 
(helices), lines (loops), and arrows (strands) above the corresponding 
sequence and are coloured as in Figs. 2a, 3a, 4a. In the sequences shown 
below, residues resolved in our structures are coloured according to 
protein colours in Fig. 1c-f: EXP2 (mint), PTEX150 (salmon) and HSP101 
(cornflower). Signal peptide residues are coloured gold. All residues in 
the mature proteins that are not resolved in our structures are shown 


in grey. The 3 x Flag residues at the C terminus of HSP101 are coloured 
green. Peptides detected in tryptic digest liquid chromatography-tandem 
mass spectrometry (LC-MS/MS) analysis of the purified PTEX sample 
are shown as black lines below the corresponding sequences. Arrowheads 
above the EXP2 sequence indicate truncations sites described in this work 
and previously*’ immediately before (A222-287, red arrowhead) and after 
(A234-287, green arrowhead) the assembly strand. Arrowheads above 

the PTEX150 sequence indicate previously described truncation sites*” 
(A847-993, red arrowhead; A869-993, green arrowhead). 
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Extended Data Fig. 7 | Electron microscopy of the PTEX core complex. 
a-c, Representative negative stain micrograph (a), enlarged portion of 
micrograph (b) and two-dimensional class averages (c) of the PTEX 

core complex in multiple orientations. d, e, Representative cryo-EM 
micrograph (d) and two-dimensional class averages (e) of the PTEX 

core complex in multiple orientations. Individual particles in d are 
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circled in yellow (top views) and green (side and oblique views) (for 
source data, see Supplementary Fig. 4). Arrow in upper left panel of e 
indicates the detergent belt, which is visible as a less-dense (dimmer) halo 
surrounding the denser (brighter) densities of the a-helices visible in the 
transmembrane domain in side views. Scale bars: 700 A (a), 700 A (b), 
100 A (c), 200 A (d) and 100 A (e). 
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Extended Data Fig. 8 | Detergent belt, amino-terminal domain, and 
claw densities visible in maps at lower thresholds. a, The engaged 

state PTEX150-EXP2 heptamer, displayed in surface representation and 
coloured by electrostatic potential. The bottom half of the full engaged 
state density map is superimposed, showing the location of the detergent 
belt in relation to the EXP2 transmembrane domain. A ring of positively 
charged residues is clearly visible directly above where the PVM surface 
would normally lie. b, c, Engaged state (b) and resetting state (c) maps 
were low-pass filtered to 6 A to improve clarity of low-resolution details, 


Engaged state c Resetting state 


N-terminal domain___ 
density 


Detergent 
belt 


3-turn helix 


HSP101 MD * 
(Protomer 3) 


and are shown overlaid, at two different thresholds to improve visibility of 
the detergent belt and the poorly resolved N-terminal domains of HSP101 
(teal, higher threshold; peach, lower threshold). d, Resetting state map 

of PTEX displayed at a lower threshold to show the strong claw-shaped 
densities extending from the PTEX150(668-823) shaft up to the HSP101 
M domain. e, Enlarged view of the interaction between HSP101 Y488 

and Y491 and the three-turn helix, shown with corresponding cryo-EM 
density (mesh). 
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Extended Data Fig. 9 | Data processing workflow. Illustration of workflow for 3D classification, focused classification and refinement. Maps are 
displayed at higher thresholds where the detergent belt is not visible for clarity, to avoid obscuring details of the transmembrane helices. 
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Extended Data Table 1 | Cryo-EM data collection, refinement and validation statistics 
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(no.) 
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Validation 
MolProbity score : 1.53 2.03 1.96 1.64 
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The Juno spacecraft, which is in a polar orbit around Jupiter, is 
providing direct measurements of the planet’s magnetic field close 
to its surface!. A recent analysis of observations of Jupiter’s magnetic 
field from eight (of the first nine) Juno orbits has provided a 
spherical-harmonic reference model (JRM09)? of Jupiter’s magnetic 
field outside the planet. This model is of particular interest for 
understanding processes in Jupiter’s magnetosphere, but to study 
the field within the planet and thus the dynamo mechanism that is 
responsible for generating Jupiter’s main magnetic field, alternative 
models are preferred. Here we report maps of the magnetic field at 
a range of depths within Jupiter. We find that Jupiter’s magnetic 
field is different from all other known planetary magnetic fields. 
Within Jupiter, most of the flux emerges from the dynamo region 
in a narrow band in the northern hemisphere, some of which 
returns through an intense, isolated flux patch near the equator. 
Elsewhere, the field is much weaker. The non-dipolar part of the 
field is confined almost entirely to the northern hemisphere, so there 
the field is strongly non-dipolar and in the southern hemisphere it 
is predominantly dipolar. We suggest that Jupiter’s dynamo, unlike 
Earth’s, does not operate in a thick, homogeneous shell, and we 
propose that this unexpected field morphology arises from radial 
variations, possibly including layering, in density or electrical 
conductivity, or both. 

Unlike Earth, for which the top of the dynamo region is well defined 
by the core-mantle boundary—that is, the boundary between the elec- 
trically conducting liquid-iron outer core (in which dynamo action 
occurs) and the overlying, poorly conducting rocky mantle—for 
Jupiter the corresponding region is less clearly defined. Even though 
self-sustaining dynamo action is most probably confined to depths 
below the metallic-hydrogen transition, the field may be affected by 
flow in the overlying molecular-hydrogen region*°, which may have 
substantial electrical conductivity, especially close to the depth of the 
metallic-hydrogen transition®’. Accordingly, we map the field at four 
equally spaced radii from the surface of Jupiter (corresponding to 
r= Ry=71,492 km, where Ry is Jupiter's radius), at which the electrical 
conductivity is vanishingly small, to r=0.85Ry, the likely depth of the 
metallic-hydrogen transition. 

To do so requires mapping the field below the orbit of the spacecraft, 
and so we must address the instability due to downward continua- 
tion. We do so by regularizing the solution using a quadratic norm 
based on the horizontal Laplacian of the radial magnetic field, thereby 
finding the smoothest possible map of the field for a given fit to the 
observations®. We select Juno magnetometer observations' from eight 
orbits in the radial distance range from r= 1.06R, (perijove) to r=2.2Ry 
(roughly corresponding to Junos highest latitude), take 30-s averages of 
the data (corresponding to one rotation of the spacecraft) and weight 
the data according to an estimate of their measurement uncertainty. 
Our resulting dataset consists of 1,991 observations of each of the three 
components of the magnetic field. 
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In Fig. 1 we show maps of the radial component of the magnetic field 
at a range of depths using our regularized inversion from the surface 
to r=0.85R, and compare with JRM097. At all depths, positive radial 
flux in the northern hemisphere is confined to a band (the northern- 
hemisphere flux band), which becomes narrower with depth. Some of 
the flux from this band then re-enters through an intense spot on the 
equator? (the Great Blue Spot), at a longitude of around 90° west (in 
System III coordinates). The morphology of the magnetic field lines 
is shown in Fig. 2 (an animated version of Fig. 2 is available at https:// 
doi.org/10.6084/m9.figshare.6828953). Elsewhere, and corresponding 
to a large proportion of the surface, the radial flux is much weaker. 

The narrowing of the northern-hemisphere flux band with depth, 
and more generally the concentration of flux into increasingly localized 
regions with depth rather than, for example, the emergence of more 
small-scale spots, is surprising given our intuition acquired from map- 
ping Earth’s magnetic field at depth. It suggests that Jupiter’s magnetic 
field at depth may be morphologically simpler than expected. This field 
morphology and its contrast to Earths field is particularly apparent in 
Fig. 3, in which we show the non-dipolar part of the field (at r=0.90R)) 
and, for comparison, Earth’s non-dipole field (at Earth’s core-mantle 
boundary). Jupiter’s non-dipole field is almost entirely confined to the 
northern hemisphere, where the non-dipole field peaks at 3 mT, a value 
almost three times stronger than the peak dipolar field. Jupiter's field 
is dipolar in the southern hemisphere and largely non-dipolar in the 
northern hemisphere, unlike Earths field. 

The strong concentration of magnetic flux in the northern- 
hemisphere flux band and in the Great Blue Spot implies the existence 
of large horizontal magnetic field gradients at the borders of these 
features, which would suggest that strong secular (temporal) variation 
of the magnetic field is likely. For example, around the Great Blue Spot 
the gradient in the radial field is approximately 3 mT/(10° m); with an 
assumed flow speed of the order of 10~* m s~! (the lower end of esti- 
mates of flow speed'®!!), we might therefore expect secular variation of 
the order of 10*nT yr7'. Although high, this estimate is not necessarily 
inconsistent with earlier inferences of much weaker time dependency” 
because secular variation at such small spatial scales would be strongly 
attenuated at the altitude of the previous observations. In addition, 
this estimate will be reduced if the flow is preferentially orthogonal to 
the field gradient, although for the Great Blue Spot that is unlikely on 
geometrical grounds to be the case. Therefore, we believe that the Great 
Blue Spot offers a very promising opportunity for forthcoming Juno 
orbits to detect secular variation. 

Numerical dynamo models in simple homogeneous shells typically 
produce fields that are either strongly dipolar or dominated by multi- 
polar fields'™', Jupiter’s field is neither, being predominantly dipolar 
in one hemisphere and non-dipolar in the other, suggesting that the 
field is not generated in a simple homogeneous region. Here we con- 
sider several possible explanations. First, we consider the possibility, 
although unlikely, that we have observed the field in a rare transitional 
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Fig. 1 | The radial component of Jupiter’s magnetic field. The plots are 
shown on a Hammer equal-area projection with the central meridian at a 
longitude of 180° west (System III coordinates). The colour scale depicts 
the strength of the radial component of the magnetic field, with yellow-red 
shades depicting field in the positive radial direction (outwards) and 
green-blue shades depicting field in the negative radial direction 
(inwards). a, b, A regularized solution (a) and the JRMO09 solution (b) at 
r=1.00R;; ¢, d, the same at r=0.95R;; e, f, the same at r=0.90R); 

f, g, the same at r= 0.85R,. Although the regularized solution and the 
JRMO09 solution have a similar pattern at each depth, the regularized 
solution reveals more intense and concentrated field structure. Overall, 
the same basic field morphology is apparent across the range of depths 
and the two models. 


state, such as a magnetic field reversal or a transition between different 
dynamo states'*!°. However, such a situation cannot necessarily be 
reconciled with the co-existence of strong dipole and non-dipole fields. 
Instead, we next consider whether Jupiter’s internal structure could 
account for the observations. 

Starting near the top of Jupiter’s dynamo region, there is the possi- 
bility ofa stably stratified layer due to precipitation of helium’®. Such 
a layer might axisymmetrize the field'’, but could also destabilize the 
field'®. However, this scenario also seems unlikely to be able to account 
for the observed hemispheric difference in the field morphology. 
There is also the effect of the steep gradient in electrical conductiv- 
ity immediately above the metallic-hydrogen transition’. A recent 
numerically simulated dynamo including this effect shows irregular 
behaviour!®, with one snapshot appearing similar to the Juno- 
determined field. This is a possibility that requires further investigation. 
Finally, another recent study” has examined flow and the generation 
of magnetic fields in Jupiter for three scenarios that involve near-sur- 
face layering, although none of the scenarios produces magnetic fields 
similar to that observed by Juno. 

At depth, other processes may be important. In particular, the 
mixture of rock and ice that probably constitutes (or constituted) 
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Fig. 2 | Magnetic field lines. a, North polar view; b, south polar view; 

c, equatorial view. The non-dipolar nature of the magnetic field in the 
northern hemisphere and the dipolar nature in the southern hemisphere 
is apparent. The equatorial view is centred near the Great Blue Spot and 
shows the linkage of magnetic field lines that enter through the Great Blue 
Spot. The contoured surface on which the field lines shown start and end 
is at r=0.85Rj, where the density of field lines is proportional to the radial 
magnetic field strength and is depicted by the colour scale (red outward 
flux, blue inward flux). An animated version of this figure is available at 
https://doi.org/10.6084/m9.figshare.6828953. 


Jupiter’s core will be soluble in hydrogen at the temperature and 
pressure expected there?!-*°. This may lead to gradual core dissolution, 
and may have been crucial in Jupiter’s thermal history*®”’. Dissolution 
of rock and ice in metallic hydrogen will increase the density of the 
hydrogen region. Recent Juno observations of Jupiter's gravity field are 
consistent with the existence of a partially or fully dissolved core inside 
Jupiter, with rock and ice non-uniformly mixed in the hydrogen out 
to approximately half the radius of the planet”’; the region further out 
may be homogeneous, except for helium rain. 

If, as theory and observations suggest, the metallic-hydrogen 
region is layered (the upper layer solute-free and the lower layer con- 
taining dissolved rock and ice), the implications for the dynamo will 
depend on the convective instability of these layers. The upper layer is 
most probably convectively unstable, given the very large heat flux 
observed at Jupiter. The properties of the lower layer are far less 
clear. If the lower layer is stable, then dynamo action will be confined 
to the upper layer and will therefore operate in a shell with a radius 
ratio (inner to outer radii) of approximately 0.5. A similar geome- 
try has been investigated previously as a possible explanation for the 
magnetic fields of Uranus and Neptune”, albeit with a numerical 
dynamo model much less sophisticated than what is now feasible. 
The magnetic field map obtained from this simulation with a radius 
ratio of 0.5 (see figure 16, model 5 in ref. 29) bears similarity to the 
map of Jupiter's field shown here, but with an axial dipole that is much 
less dominant. In addition, structure may arise from double diffusive 
convection”. 

Alternatively, if the lower layer is convectively unstable, then it 
could be convecting separately from the layer above owing to the 
possible presence of a density jump at the boundary between the 
layers”®. Convection in Jupiter’s metallic-hydrogen region can be 
driven by relative density variations (Ap/p) of the order of 10~°, so 
even a small density jump could be impervious to convection. In this 
scenario, dynamo action may occur separately in the thick lower shell 


6 SEPTEMBER 2018 | VOL 561 | NATURE | 77 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


a pe 


- = i a 


as 8 57 gy 9 '' 


=a 
—4.0 mT 4.0mT -1.0 mT 1.0 mT 
Fig. 3 | Non-dipole radial field. a, The non-dipolar part of Jupiter's 
radial magnetic field at r=0.90Rj. b, For comparison, the non-dipolar 
part of Earth’s radial magnetic field at the core-mantle boundary 
(r=0.55R_ = 3,485 km, where Rg is Earth’s radius). Almost all of Jupiter’s 
non-dipole radial field is concentrated in the northern hemisphere, 


whereas Earth’s field is evenly distributed throughout. 


(radius ratio of less than 0.2) and in the thin upper shell (radius ratio of 
approximately 0.5), with the resultant field sharing properties of both 
a thick-shell dynamo (strong axial dipole) and a relatively thin-shell 
dynamo (hemispheric asymmetry). 

The presence or absence of reduced magnetic flux at high latitude 
may provide a means of distinguishing between these alternatives. If 
the lower layer is stably stratified, then convection in the outer layer 
within the tangent cylinder (the axial cylinder tangential to the inter- 
face between the two layers) may differ from that outside the tangent 
cylinder. If the lower layer is convectively unstable, then such an 
effect seems less likely to occur. To resolve this additional Juno orbits 
are required. Juno’s orbit, with perijove precessing northward by 
approximately 1° per orbit, is evolving in such a way that mid- and 
high-latitude structure will be better resolved towards the second half 
of the planned 34-orbit baseline mission*° 


Data availability 

The Juno magnetometer data used in this study will be made available through 
the NASA Planetary Data System (https://pds.nasa.gov) in accordance with 
NASA policy. An animated version of Fig. 2 is available at https://doi.org/10.6084/ 
m9.figshare.6828953. 
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Synthetic three-dimensional atomic structures 


assembled atom by atom 


Daniel Barredo!*, Vincent Lienhard!?, Sylvain de Léséleuc!*, Thierry Lahaye! & Antoine Browaeys! 


A great challenge in current quantum science and technology 
research is to realize artificial systems of a large number of 
individually controlled quantum bits for applications in quantum 
computing and quantum simulation. Many experimental 
platforms are being explored, including solid-state systems, 
such as superconducting circuits! or quantum dots”, and atomic, 
molecular and optical systems, such as photons, trapped ions or 
neutral atoms*’. The latter offer inherently identical qubits that are 
well decoupled from the environment and could provide synthetic 
structures scalable to hundreds of qubits or more®. Quantum-gas 
microscopes? allow the realization of two-dimensional regular 
lattices of hundreds of atoms, and large, fully loaded arrays of 
about 50 microtraps (or ‘optical tweezers’) with individual control 
are already available in one!® and two! dimensions. Ultimately, 
however, accessing the third dimension while keeping single-atom 
control will be required, both for scaling to large numbers and for 
extending the range of models amenable to quantum simulation. 
Here we report the assembly of defect-free, arbitrarily shaped 
three-dimensional arrays, containing up to 72 single atoms. We use 
holographic methods and fast, programmable moving tweezers to 
arrange—atom by atom and plane by plane—initially disordered 
arrays into target structures of almost any geometry. These results 
present the prospect of quantum simulation with tens of qubits 
arbitrarily arranged in space and show that realizing systems of 
hundreds of individually controlled qubits is within reach using 
current technology. 

Three-dimensional atomic arrays at half filling have been obtained 
using optical lattices with large spacings'”, which facilitate single-site 
addressability and atom manipulation’. As an alternative approach, 
here we use programmable holographic optical tweezers to create 
three-dimensional (3D) arrays of traps. Holographic methods offer 
the advantage of higher tunability of the lattice geometry because the 
design of optical potential landscapes is reconfigurable and only limited 
by diffraction'*"!®. In our experiment”, arbitrarily designed arrays of 
up to about 120 traps are generated by imprinting a phase pattern on a 
dipole trap beam at 850 nm with a spatial light modulator (Fig. 1a). This 
phase mask is calculated using the 3D Gerchberg-Saxton algorithm, 
simplified for the case of point traps!”. The beam is then focused with 
a high-numerical-aperture (0.5) aspheric lens under vacuum, creating 
individual optical tweezers with a measured 1/e” radius of about 1.1,4m 
and a Rayleigh length of approximately 5 1m. After recollimation with 
a second aspheric lens, the intensity of the trapping light is measured 
using a standard charge-coupled device (CCD) camera. An electrically 
tunable lens (ETL1) in the imaging path allows us to acquire series of 
stack images along the optical axis z, from which we reconstruct the 
full 3D intensity distribution. The imaging system covers a z-direction 
scan range of 200m. 

Figure 1b-d shows some examples of patterns suitable for experiments 
with single atoms. The images are reconstructed using a maximum- 
intensity projection method’® from 200 z images obtained with the 
diagnostics CCD camera. With about 3.5 mW of power per trap we 


reach depths of Uo/kg 1 mK, where kg is the Boltzmann constant, and 
radial (longitudinal) trapping frequencies of around 100 kHz (20 kHz). 
We produce highly uniform microtrap potentials (with peak inten- 
sities differing by less than 5% root mean square) via a closed-loop 
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Fig. 1 | Experimental setup and trap images. a, We combine a spatial light 
modulator (SLM) and a high-numerical-aperture aspheric lens (AL) under 
vacuum to generate arbitrary 3D arrays of traps. The intensity distribution 
in the focal plane is measured with the aid of a second aspheric lens, a 
mirror (M) and a diagnostics CCD camera (d-CCD). The fluorescence of 
the atoms in the traps at 780 nm is separated from the dipole trap beam 
with a dichroic mirror (DM) and detected using an electron-multiplying 
CCD camera (EMCCD). For atom assembly we use moving tweezers 
superimposed on the trap beam with a polarizing beam splitter (PBS). 
This extra beam is deflected in the plane perpendicular to the beam 
propagation with a 2D acousto-optical deflector (AOD), and its focus 

can be displaced axially by changing the focal length of an electrically 
tunable lens (ETL3). The remaining electrically tunable lenses (ETL1 

and ETL2) in the camera paths allow imaging of different planes along 

z. The inset depicts the intensity distribution of the trap light forming a 
bilayer array (red) and the action of the moving tweezers on an individual 
atom (purple). b-d, Intensity reconstructions of exemplary 3D patterns 
obtained from a collection of z-stack images taken with the diagnostics 
CCD camera. The regions of maximum intensity form a trefoil knot (b), 
a5 x5 x 5 cubic array (c) and a C329 fullerene-like structure (d). The 
dimensions, L,, Ly, Lz, of the images are the same in all the examples. 
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Fig. 2 | Single-atom fluorescence in 3D arrays. a-f, Maximum-intensity- 
projection reconstruction of the average fluorescence of single atoms 
loaded stochastically into exemplary arrays of traps. The x, y, z scan range 
of the fluorescence (L;, Ly, Lz) is the same for all the 3D reconstructions. 


optimization'*. Rubidium-87 atoms are then loaded in the traps from 
a magneto-optical trap (MOT), with a final temperature of 25 1K. We 
detect the occupancy of each trap by collecting the fluorescence of the 
atoms at 780 nm with an electron-multiplying CCD camera for 50 ms. 
A second tunable lens (ETL2) in the imaging path is used to focus the 
fluorescence of different atom planes. 

In Fig. 2 we show the fluorescence of single atoms trapped in 
various complex 3D structures, some of which are relevant, for instance, to 
the study of non-trivial properties of Chern insulators'??!. Each exam- 
ple is reconstructed from a series of 100 z-stack images covering an 
axial range of about 120m. With no further action, these arrays are 
randomly loaded with a filling fraction of about 0.5; we thus average 
the fluorescence signal over 300 frames to reveal the geometry of the 
structures. 

For deterministic atom loading, we extend our two-dimensional 
(2D) atom-by-atom assembler'! to 3D geometries. For that, we super- 
impose a second 850-nm laser beam (with 1/e” radius of about 1.3 j1m) 
on the trapping beam, which can be steered in the x-y plane using a 
2D acousto-optical deflector and in the z direction by changing the 
focal length of a third tunable lens (ETL3). Combined with a real-time 
control system, the moving tweezers can perform single-atom transport 
with fidelities exceeding 0.993, as shown in ref. Nand produce fully 
loaded arrays by using independent and sequential rearrangement of 
the atoms for each of the n, planes in the 3D structures. 

To explore the feasibility of plane-by-plane atom assembly, we first 
determine the minimal separation between layers so that each tar- 
get plane can be reordered without affecting the others. To quantify 
this, we perform the following experiment in a 2D array containing 
46 traps. We randomly load the array with single atoms and demand 
the atom assembler to remove all the atoms. We average over about 
50 realizations and then repeat the experiment for different axial sep- 
arations between the position of the moving tweezers and the trap 
plane. The result is shown in Fig. 3a, where we see that for separations 
beyond about 17 1m the effect of the moving tweezers on the atoms 
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is negligible. This distance can be further reduced to about 14 1m by 
operating the moving tweezers with less power, without any degrada- 
tion in the performance of the sorting process. In a complementary 
experiment, where we fully assembled small arrays, we also checked 
that the assembling efficiency is not affected by slight changes (below 
about 3 jm) in the exact axial position of the moving tweezers. 

We now demonstrate full loading of arbitrary 3D lattices using plane- 
by-plane assembly. We start by creating a 3D trap array that can be 
decomposed in several planes normal to z. In each plane we generate 
approximately twice the number of traps that we need to load, so that 
we can easily load enough atoms to assemble the target structure. The 
sequence used to create fully loaded patterns (see Fig. 3b) starts by 
loading the MOT and monitoring the atoms entering and leaving the 
traps by sequentially taking a fluorescence picture for each plane. We 
trigger the assembler as soon as there are enough atoms in each plane 
to fully assemble it. We then freeze the loading by dispersing the MOT 
cloud and record the initial positions of the atoms by another series of 
z-stack images. Analysis of the images reveals which traps are filled with 
single atoms. We use this information to compute (in about 1 ms) the 
moves needed to create the fully loaded target array and perform plane- 
by-plane assembly by changing the z position of the moving tweezers 
after the assembly in each plane is completed. Finally, we detect the final 
3D configuration with another series of z-stack images. 

Figure 3c-h shows a gallery of fully loaded 3D atomic arrays arbi- 
trarily arranged in space. We can create fully loaded 3D architectures 
with up to 72 atoms distributed in several layers with different degrees 
of complexity. The selected structures include simple cubic lattices 
(Fig. 3d), bilayers with square or graphene-like”” arrangements (Fig. 3c, 
e, g), lattices with inherent geometrical frustration such as pyrochlore”? 
(Fig. 3f) and lattices with cylindrical symmetry (Fig. 3h), which are 
suitable, for example, for studying quantum Hall physics with neutral 
atoms”. The arrays are not restricted to periodic arrangements, and the 
positions of the atoms can be controlled with high accuracy (<1 um). 
The minimum interlayer separation that we can achieve depends on 
the type of underlying geometry. This is illustrated in Fig. 3e, which 
shows the full 3D assembly of a bilayer square lattice (with a layer sep- 
aration of d,=5\m). There, sites corresponding to the second layer 
are displaced by half the lattice spacing. Because traps belonging to 
neighbouring layers do not have the same (x, y) coordinates, there is 
no limitation to the minimum interlayer distance that we can produce. 
In both images we can observe a defocused fluorescence at intersite 
positions due to atoms trapped in the neighbouring layer. By contrast, 
whenever traps are aligned along the z axis (for example, in Fig. 3d), we 
set a minimum axial separation of about 17 1m to avoid any disturbance 
from the moving tweezers on the atoms while assembling neighbouring 
planes. However, for some trapping geometries this constraint can be 
overcome by applying a small global rotation of the 3D trap pattern 
around the x or y axis, so that neighbouring traps do not share the 
same (x, y) coordinates. The minimum interlayer spacing ultimately 
depends on the Rayleigh range of our trapping beam (about 5}1m) and 
could be further reduced, for example, by using an aspheric lens with 
higher numerical aperture. The range of interatomic distances that we 
can achieve (3— 401m) is suitable for implementing fast qubit gates?’ 
or simulating excitation transport” and quantum magnetism with 
Rydberg atoms, because interaction energies between Rydberg states 
at those distances are typically in the megahertz range. 

To illustrate this possibility, we performed a proof-of-principle 
experiment with two atoms belonging to the cylindrical lattice dis- 
played in Fig. 3h. The atoms are separated by a total distance of 
Ry =20pm (d,= 10pm, d,=17 um); see Fig. 4. We first initialize the 
atoms in state |g) =|5Sj/2, F=2, mp=2), where F and mpare the hyper- 
fine and magnetic quantum numbers, respectively, by optical pumping 
in a 47-G magnetic field that defines the quantization axis and is 
aligned perpendicular to the internuclear axis. Then, the dipole trap is 
switched off and a two-photon Rydberg stimulated Raman adiabatic 
passage”’ excites both atoms to the |{) =|60S1/2, mj= 1/2) Rydberg 
state, where m; is the spin projection along the magnetic field direction. 
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Fig. 3 | Fully loaded 3D arrays of single atoms. a, Recapture probability 
as a function of the axial distance between the focus of the moving 
tweezers and the plane of the atoms measured experimentally by trying to 
remove all the atoms from a 46-trap array. Error bars denote the standard 
error of the mean and are smaller than the symbol size. The line is a guide 
to the eye. b, Time control sequence of the experiment. We start the 
experiment by recording sequentially an image for each target plane. The 


We further use a resonant microwave field and local addressing”® to 
transfer the second atom to the ||) =|60P 1/2, m= —1/2) state, while 
the first atom remains in |1). In these two Rydberg levels, the atoms are 
coupled by a direct dipole-dipole interaction with a strength of 
U=C,/R o and a calculated C; coefficient of C;=hx 1,357 MHzjm?, 
where h is the Planck constant. The prepared pair-state |{ |) evolves 
under the XY-spin Hamiltonian H= (C,/Rj}) (o,, 0) +0, 05°) 

(where o;~ denotes the Pauli matrices acting on atom i= {1, 2}) and 
undergoes coherent spin-exchange oscillations between |} |) and ||) 


d Cubic lattice 


(d, = 20 um; 36 atoms) 


f Pyrochlore-like lattice 


(d, = 25 uum; 42 atoms) 


h_ Cylindrical lattice 


(d, = 17 um; 42 atoms) 


(50 ms x n,) (60 ms x n,) t 


analysis of the resulting np images reveals the initial position of the atoms 
in the traps. The 2D atom assembler, in combination with an electrically 
tunable lens (ETL3), arranges the atoms plane by plane. Finally, a new set 
of sequential images is collected to capture the result of the 3D assembly. 
c-h, Fully loaded arrays with arbitrary geometries. All images are single 
shots. The models of the 3D configurations are shown for clarity; the 
colours of the frames around the images encode successive atomic planes. 


as a function of the variable interaction time, T. Finally, a de-excitation 
sequence projects the population in |1) to |g), but leaves the population 
in ||) unaffected. After switching the dipole trap on again, atoms in |g) 
are recaptured, while atoms in the excited state ||) are repelled by the 
trapping potential of the optical tweezers and appear as atom losses in 
the final fluorescence images. The outcome of this experiment is shown 
in Fig. 4. We observe coherent ‘flip-flops between ||) and||1) witha 
measured frequency of 2U/h =333 +5 kHz. This value is consistent 
with the frequency 2U/h = 339 kHz expected from our distance 
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Fig. 4 | Spin-exchange dynamics between two Rydberg atoms in 
different z layers. Excitation-hopping oscillations between |{ |) and 

|| 1), observed in the populations P;), P|;, driven by the dipole-dipole 
interaction between two Rydberg states, |}) = |60S,/2, mj= 1/2) and 

|) =|60P1)2, mj=—1/2), at a distance of about 201m (d,= 10m; 
d,=17\1m). Error bars represent the standard error of the mean and are 
mostly smaller than the symbol size. Solid lines are damped sine fits to the 
data. The direction of the magnetic field, B, is indicated. 


calibration (Rj. = 20 + 1 pm), which was performed by optical means. 
The finite contrast and the small damping of the oscillations arise from 
experimental imperfections (errors in state preparation and readout, 
residual atomic temperature), as reported in ref. 7°. This proof-of- 
principle experiment demonstrates the feasibility of performing 
quantum simulations using our defect-free 3D atomic arrays of single 
atoms. Excitations hopping under the influence of this Hamiltonian 
are equivalent to a system of hard-core bosons. The dipole-dipole 
interactions observed here can be further exploited to engineer 
Hamiltonians containing complex hopping amplitudes, which are 
suitable for the study of, for example, topological insulators””. 

Besides the unique tunability of the geometries that it provides, our 
atom-assembling procedure is highly efficient: we reach typical filling 
fractions of 0.95. This measured efficiency is slightly dependent on the 
number of planes and is mainly limited by the lifetime of the atoms in 
the traps (about 10 s) and the duration of the sequence (we typically need 
60 ms per plane to acquire the fluorescence images and about 50 ms per 
plane to perform atom sorting). The repetition rate of the experiment 
is about 1 Hz. The number of traps and the filling fraction of the arrays 
could be further increased with current technology: (i) the volume of 
the trap array and the maximum number of traps can be extended by 
increasing the field of view of the aspheric lens and the laser power; 
(ii) the lifetime of the atoms in the traps can realistically be increased 
by an order of magnitude; (iii) the repetition rate of the experiment 
can be increased by optimizing the atom assembler"’, in particular by 
transferring atoms also between different planes*!; and (iv) the initial 
filling fraction of the arrays could reach values exceeding 0.8 by using 
tailored light-assisted collisions*”**. Therefore, the generation of three- 
dimensional structures containing several hundred atoms at unit filling 
seems within reach, opening up many new possibilities in quantum 
information processing and quantum simulation with neutral atoms. 
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Sorting ultracold atoms in a three-dimensional 
optical lattice in a realization of Maxwell’s demon 


Aishwarya Kumar!, Tsung-Yao Wul, Felipe Giraldo! & David S. Weiss! 


In 1872, Maxwell proposed his famous ‘demon’ thought experiment!. 
By discerning which particles in a gas are hot and which are cold, 
and then performing a series of reversible actions, Maxwell's demon 
could rearrange the particles into a manifestly lower-entropy state. 
This apparent violation of the second law of thermodynamics 
was resolved by twentieth-century theoretical work”: the entropy 
of the Universe is often increased while gathering information’, 
and there is an unavoidable entropy increase associated with the 
demon’s memory’. The appeal of the thought experiment has led 
many real experiments to be framed as demon-like. However, past 
experiments had no intermediate information storage’, yielded only 
a small change in the system entropy” or involved systems of four 
or fewer particles*!°. Here we present an experiment that captures 
the full essence of Maxwell’s thought experiment. We start with a 
randomly half-filled three-dimensional optical lattice with about 60 
atoms. We make the atoms sufficiently vibrationally cold so that the 
initial disorder is the dominant entropy. After determining where 
the atoms are, we execute a series of reversible operations to create 
a fully filled sublattice, which is a manifestly low-entropy state. Our 
sorting process lowers the total entropy of the system by a factor of 
2.44. This highly filled ultracold array could be used as the starting 
point for a neutral-atom quantum computer. 

With an eye towards quantum computing and quantum simula- 
tion applications, there has been a recent boom in cold-atom sorting 
experiments. Atoms in a variety of arrays of dipole light traps have 
been impressively rearranged by moving individual traps°”!"!”. The 
entropy associated with disordered occupancy in those cases is at 
most about 10% of the system entropy’, which is dominated by vibra- 
tional excitation in the traps. Good vibrational cooling, along with 
well-sorted atoms, is required for cold collision-based quantum gates 
or quantum simulations. For Rydberg-based gates or simulations’’, 
although atoms are not strictly required to be vibrationally cold, colder 
is better. Rydberg gates using colder atoms are likely to yield higher 
fidelity because the atoms are less likely to change vibrational states 
during the gate, which can undesirably entangle atomic motion with 
qubit states. In general, better-localized atoms allow higher-fidelity 
addressing of individual atoms!“. In blue-detuned traps, cold atoms 
see less light and thus scatter fewer trapping photons, which leads 
to longer coherence times. For instance, the coherence time in our 
experiment now exceeds 12s. 

Four atoms in a one-dimensional optical lattice® have been com- 
pacted using a method’® similar to the one that we demonstrate here 
with 50 atoms in three dimensions. We note that at least about 50 qubits 
are needed for a quantum computer to perform a calculation that can- 
not be accomplished on a classical computer’®. A three-dimensional 
(3D) geometry gives atoms many more nearby neighbours, which pro- 
vides higher connectivity in the system. It also allows for a broad range 
of quantum simulations and is favourable for further scaling of the 
number of atoms in the system. 

Our experiment proceeds as follows. We prepare a randomly 
56%-filled blue-detuned 3D lattice with 4.8 jm lattice spacing'”. By 
imaging polarization-gradient-cooling laser light, we determine the 


occupancy across the lattice with an error of 10~? per site in 800 ms 
(ref. '7). Projection sideband cooling!® puts 89% of the caesium 
atoms into their vibrational ground states and >99.7% of them in the 
|F = 4, mp = —4) hyperfine ground state, where F and meare the hyper- 
fine and magnetic quantum numbers, respectively. We then combine 
the ability to address atoms at individual sites (by using crossed laser 
beams and microwaves to make site-dependent state changes’”) with 
the ability to make state-dependent lattice translations (by rotating the 
lattice beam polarizations”). Starting from a given 3D occupancy map 
we devise a sequence of operations to fill up either a5 x 5 x 2ora 
4 x 4 x 3 sublattice. 

We can target any site ina5 x 5 x 5 lattice by using a pair of focused 
addressing beams intersecting at a right angle'*'?. Targeting pro- 
ceeds as in our previous demonstration of high-fidelity single-qubit 
gates'*, but the magnetic sublevels are different and in this case we 
are unconcerned with long-term quantum coherence. The addressing 
beams shift the (|F = 4, mp = —4) to |F = 3, mp = —3) resonance by 
around 50 kHz, which allows us to drive the associated microwave tran- 
sition using an adiabatic fast-passage pulse (see Methods for details) 
that transfers only the target atom. An atom making the transition from 
Mp = —4 to mp = —3 moves from the ‘stationary’ to the ‘motion’ state. 

The linear polarizations of the two beams that create the lattice in 
a given direction are initially aligned, so the two states are trapped 
nearly identically. When the polarization of one of the lattice beams is 
rotated (using two electro-optic modulators and a \/4 plate, where \ is 
the wavelength), the optical lattices for the two states move in opposite 
directions (see Fig. 1a). After rotating the polarization by 1, we optically 
pump the atoms back to the stationary state and rotate the polarization 
back. The net effect of this sequence is that atoms that start in the sta- 
tionary state move but return to the same place, while atoms that start 
in the motion state are shifted by one lattice site. 

The sorting algorithm for compacting atoms in the lattice was pro- 
posed in previous work'*”!; we have slightly modified it to allow the 
filling of any continuous sublattice (see Methods). The general idea is to 
first perform a series of balancing steps in the x and y directions so that 
every row in the z direction has the required number of atoms to fill a 
desired number of planes. Then, a series of compaction steps in the z 
direction moves atoms to fill the planes of the target sublattice (Fig. 1b). 
For example, to filla5 x 5 x 2 sublattice from a half-filled 5 x 5 x 5 
lattice, atoms are first ‘balanced’ in the x and y directions so that every 
row in the z direction has at least two atoms. Parallel z-motion steps 
then move the atoms to the desired planes. After sorting, we reimage 
the atoms and repeat the procedure to correct any errors. The ability to 
know exactly where the vacancies are is an advantage of this approach 
to filling a lattice compared to implementing a superfluid-Mott insula- 
tor transition2, where residual occupancy errors are unknown. 

Figure 2 shows two implementations of this algorithm, in which tar- 
get sublattices were completely filled after two sorts. In general, start- 
ing with at least half the lattice sites filled in a5 x 5 x 5 array, three 
sorts leave us with an average filling fraction of 0.97 for 5 x 5 x 2 and 
0.95 for 4 x 4 x 3. We achieve the perfect filling shown in Fig. 2f and 
Fig. 2c 32% and 27% of the time, respectively. For the first sort, the 
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Fig. 1 | Motion steps and sorting algorithm. a, Motion steps used to 

fill a vacancy in a given direction. The curves show the lattice potential 

as a function of position for the ‘motion’ state (orange curve) and 

the ‘stationary’ state (blue curve). Brown curves indicate overlapping 
potentials. The arrows denote the direction of a time series in which the 
angle (#) between the polarizations of the two lattice beams is adiabatically 
ramped to x and back to 0. The atom to be moved is transferred to the 
motion state (orange circle) using targeted addressing at the beginning of 
the time series. As the polarization of one of the lattice beams is rotated, 
the atoms in the motion state and the stationary state (blue circle) move 
in opposite directions, settling half a lattice spacing away from their 


average number of motion steps is 6.4 (5.6) and the average number 
of addressing operations was 38 (62) for filling a5 x 5 x 2 (4 x 4 x 3) 
sublattice. Each sort takes about 190 ms on average. Figure 3 shows 
the filling fraction as a function of the number of sorts. These num- 
bers match well with Monte Carlo simulations that consider measured 
sources of error (see Methods). A major source of error for atoms in 
both the motion and stationary states is spontaneous emission from the 
lattice. The spontaneous emission rate is significantly higher (17 times 
on average) during a motion step because the lattice intensity is not 
zero at the trap minima during the motion (see Fig. 1a). When an atom 
spontaneously emits a photon and changes hyperfine state, it becomes 
anti-trapped and is lost. The measured average loss per motion step is 
about 4 x 10~*. Another source of error is imperfect transfer of atoms 
from the stationary state to the motion state. Our measured transfer 
fidelity is 0.986, limited by a combination of imperfect addressing beam 
shape, pointing noise of the addressing beams and magnetic field fluc- 
tuations. This error can cause two atoms to end up in the same lattice 
site, both of which are lost during imaging. The number of sorts that 
can be performed to fill errors is eventually limited by the 92-s vacuum 
lifetime and by double-atom loss. Optical pumping leads to a modest 
amount of heating, exciting about 7% of the population from the 3D 
vibrational ground state per motion step. Were we to replace the more 
convenient optical pumping with targeted addressing, this number 
would be reduced to 0.6%. 

After sorting and a final round of projection cooling, we measure the 
vibrational sidebands to determine the final ground-state occupation, 
as shown in Fig. 4. Projection sideband cooling (see Methods) leads to 
ground-state occupation probabilities of 0.949(7), 0.954(6) and 0.985(1) 
in the x, y and z directions, respectively, which implies 89% occupation 
of the 3D vibrational ground state. The state is not thermal, but most 
of the population is in the lowest three levels. We calculate that the 
vibrational entropy for this state is about 0.59kg per particle, where kg 
is the Boltzmann constant. 

The configurational entropy is given by” 
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original positions when @ = x. The atom in the motion state is then 
optically pumped to the stationary state (illustrated by the red arrow). As 
the polarization is rotated back, both atoms move in the same direction, 
with the atom that started in the stationary state returning to its original 
position and the atom that started in the motion state moving by one 
lattice site. b, Simplified illustration of two parts of the sorting algorithm 
ina3 x 3 x 3 lattice. Orange and blue circles are as in a; empty circles 
denote empty sites. The first motion step ‘balances’ the array so that every 
z row has exactly two atoms. The second motion step ‘compacts’ atoms 
into two planes. 


where 7 is the filling fraction. The solid blue line in Fig. 3 shows 
the configurational entropy as a function of the number of sorts, and 
the dotted line shows the vibrational entropy after projection cool- 
ing. Sorting reduces the configurational entropy by a factor of 8 and 
the total entropy by a factor of 2.44. The final total entropy per 
particle is 0.75kg. 

The number of required motion steps scales as N"?, where N is the 
number of atoms to be sorted!*”!, Similar scaling for state flipping 
could be obtained if the addressing beams were generated holograph- 
ically; such a versatile 3D light pattern would allow many atoms to be 
state-flipped with microwaves simultaneously. Monte Carlo simulations 
using our current sequential addressing scheme and error rate show 
that starting from a half-filled 10 x 10 x 10 lattice, 10 x 10 x 4and 
7 x 7 x 7 sublattices can be filled to a filling fraction of about 0.93. 
The error due to the motion could be reduced by further detuning the 
lattice light. Tripling the lattice detuning would decrease the sponta- 
neous emission rate by a factor of 9 and the lattice depth by a factor of 
3. Although the resulting lower trap frequency would require that we 
move atoms three times more slowly, the total spontaneous emission 
per motion step would be reduced by a factor of three, which would 
improve the filling fraction to approximately 0.975 for about 400 sorted 
atoms. It should be possible to improve microwave transfer errors by 
an order of magnitude by improving magnetic field stabilization and 
adapting our phase gate!4, which is insensitive to addressing-beam 
intensity fluctuations. Because atoms in a 3D lattice geometry have 
many near neighbours, small known filling errors can be readily incor- 
porated into the design of any quantum computation. 

We now more fully discuss our characterization of this experiment 
as the first, to our knowledge, to capture the full essence of Maxwell's 
demon on a large array of particles. Any process that involves selectively 
acting on particles differently depending on their momentum, energy 
or internal state, like all laser-cooling methods*?4”°, evokes an aspect 
of Maxwell’s demon, who sorted particles based on velocity. However, 
when there is no stored information, such mechanisms differ in spirit 
from Maxwell’s demon and other thought-experiment demons”*. When 
the entropy increase of the outside world is built into the cooling cycle, 
carried away by lost particles or scattered light, there is no trace of the 
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Fig. 2 | Perfect filling of 4 x 4 x 3and5 x 5 x 2 sublattices. a-f, The 
five images in each row correspond to the five lattice planes (labelled 1-5). 
The colour map shows intensity. We have applied contrast enhancement (a 
threshold of ~35% of the peak intensity) to make empty sites more obvious 
in the figure. The associated grid patterns are real-time occupancy maps, 
generated by processing the five images. The two sets of images (a-c and 
d-f) are from two different experimental implementations. a, An initial, 
unsorted atom distribution. The occupancy maps are used as the basis 

of a series of site-selective state flips and state-selective translations that 
execute our sorting algorithm. b, Result after one sorting sequence with 
the goal of filling a 4 x 4 x 3 sublattice in planes 2-4. There are three 


theoretical paradox that twentieth-century information theory worked 
to resolve’. 

By contrast, our experiment is conceptually similar to Maxwell's 
thought experiment. We increase the entropy of the outside world in 
the process of determining site occupancy. At the same time, the con- 
figurational entropy goes to zero because there is only one state with 
that particular configuration. The stored occupancy information is 
then used as a guide to the execution of reversible operations that 
leave the system in a manifestly low-entropy state. Of course, that 
is also true for any sorting operation, as when checkers are arrayed 
on a board. The difference here is that most of the initial entropy of 
our system is in the initial configurational disorder, so that by meas- 
uring and sorting we considerably reduce the total system entropy. 
Maxwell’s demon collected information and acted on one particle 
at a time. By contrast, our demon obtains an occupancy map of the 
whole system, so that it can map out a plan to act on all the particles 
in parallel. 

Maxwell visualized work being extracted from the reconfigured 
system by using the demon-imposed temperature gradient to drive a 
heat engine. Work can probably not be extracted in our experiment, 
but the fact that the overall system entropy is reduced means that trap 
changes that affect all atoms in the same way can create a much colder 
gas. For instance, the experiment would pass the 1.24kg entropy- 
per-particle threshold below which there would be a Bose-Einstein 
condensate if the lattice were adiabatically shut off and the atoms were 
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errors after this sort (one in plane 3, two in plane 4). c, Result after a 
second sorting sequence starting from the distribution in b. The sorting 
goal has been reached. Atoms outside the target sublattice can be kept as 
spares, or they can be selectively state-flipped and removed by a resonant 
clearing beam. d, Another initial, unsorted atom distribution. e, The result 
after one sorting sequence with the goal of filling a5 x 5 x 2 sublattice in 
planes 2 and 3. There are four errors after this sort (one in plane 2, three 
in plane 3). f, The result after a second sorting sequence starting from the 
distribution in e. The sorting goal has been reached. The absence of spare 
atoms in f is coincidental. 
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Fig. 3 | Filling fraction and entropy. The empty red symbols show 

the filling fraction as a function of the number of sorts for 5 x 5 x 2 
(circles) and 4 x 4 x 3 (squares) target sublattices. The circles (squares) 
show results based on 85 (48) experimental implementations. The red 
horizontal dashed line is the limit associated with loss from collisions with 
background gas atoms during the 1 s required to image and sort. The solid 
blue symbols show the configurational entropy as a function of the number 
of sorts for 5 x 5 x 2 (circles) and 4 x 4 x 3 (squares) target sublattices. 
The total entropy at the beginning and at the end is the sum of the 
vibrational entropy (blue horizontal dotted line) and the configurational 
entropy; sorting reduces it by a factor of 2.44. The 1a error bars are smaller 
than the size of the symbols. 
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Fig. 4 | Microwave spectra showing the results of projection sideband 
cooling’. Atoms start in |F = 4, mp = —4), and the curves show the 
number of atoms that make an adiabatic fast-passage transition from the 
|F = 4, mp = —4) to the |F = 3, mp = —3) state because atoms that remain 
in |F = 4, mp = —4) are cleared by a resonant beam before detection. 
Slightly rotating the polarization of one lattice beam of the x, y or z lattice 
beam pairs leads to non-zero projections between different vibrational 
states (vibrational quantum number r) in that direction. Nearly all atoms 
make the transition at the peaks centred at zero, where the frequency is 
f= fo © 9.19130 GHz, which corresponds to no change of vibrational 
level. Nearly all atoms make the transition at the lower-frequency peaks 
(at f — fo + —15 kHz), which corresponds to the vibrational quantum 
number increasing by one (Av = +1). Atoms make the transition at the 
higher-frequency peaks (Av = —1, at f - fo © 15 kHz) only if there is a 
lower vibrational level available, which is true for all atoms except those 
in the vibrational ground state. The heights of the Av = —1 peaks thus 
provide a measure of the atoms that are not in the vibrational ground state. 
The empty maroon circles show the Av = —1 and Av = —2 sidebands in 
the x direction before projection sideband cooling. The sidebands for the 
y and z directions are similar (not shown). The solid maroon circles, solid 
blue squares and solid green diamonds show the spectra for the x, y and z 
directions, respectively, after projection sideband cooling. Each data point 
in the ‘before cooling’ dataset is obtained from about 60 atoms and in the 
‘after cooling’ datasets from about 400 atoms. All error bars represent 

one standard deviation. The maroon, blue and green solid lines are sums 
of four fitted super-Gaussians of order 4 for the x, y and z directions, 
respectively. We infer from the large suppressions of the Av = —1 and 
Av = —2 sidebands in this figure that the 3D vibrational ground state is 
occupied 88.9(9)% of the time. 


left in a 3D box potential’. The adiabatic timescale for making this 
approximate Mott-insulator-to-superfluid transition is far too large 
given our current lattice spacing, but it could be accomplished if the 
lattices were made in an accordion configuration’, in which the angle 
between the beam pairs could be dynamically changed to reduce the 
lattice constant by a factor of five. That would present a third path to 
quantum degeneracy of cold atoms, joining many evaporative-cooling 
experiments and one laser-cooling experiment”””. 

Maxwell’s thought experiment led to a deep understanding of 
the relationship between entropy and information. The experiment 
that we present here has several practical applications. It prepares a 
favourable initial state for a neutral-atom quantum computer with 
one atom at nearly every 3D lattice site, each cooled near its vibra- 
tional ground state. The cold array minimizes many potential errors 
in Rydberg-gate-based quantum computations. The 3D optical lattice 
allows entanglement with many near neighbours and provides favour- 
able scaling, often as N 13 or N23, to minimize computation time and 
the laser power requirements. If we can further improve cooling—for 
instance, by temporarily transferring the atoms to a lattice with smaller 
detuning, where the atoms are trapped more deeply in the Lamb-Dicke 
limit—we might be able to create large-scale entanglement through 
cold-collision gates*”*! and thus ultimately implement one-way quan- 
tum computation*”. Our sorted array could also be used for a variety of 
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Rydberg-based quantum simulations with different geometries, dimen- 
sionalities and anisotropy of interactions. For instance, the simulations 
of Ising-like Hamiltonians that have recently been implemented in one- 
and two-dimensional (2D) tweezer and microtrap arrays'©*3 could be 
extended to three dimensions in our optical lattice. Our demonstrated 
coherent site-selective control’ allows the possible implementation 
of a universal quantum simulator, which might be used to implement 
Kitaev’s toric code in 2D sublattices and in 2D or 3D lattice gauge 
theories*. 
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METHODS 


Apparatus. We load atoms from a Magneto-Optical Trap to a 3D optical lattice 
formed by three pairs of 75-j1m-waist, 838.95-nm laser beams. Each lattice beam 
has a power of 250 mW, giving a lattice depth of 190 1K at the central lattice site. 
The two beams in each pair cross each other at 10°, yielding a lattice spacing of 
4.9 um. Two pairs are frequency-shifted relative to the third by +30 MHz and 
—175 MHz to prevent mutual interference among the lattice pairs. One beam in 
each pair has two electro-optic modulators in its path, aligned so that their axes are 
at 45° relative to the incoming polarization, followed by a \/4 wave-plate aligned 
with the incoming polarization. As the voltage on the electro-optic modulators is 
increased, the polarization of this beam rotates. The angle of rotation is 7 when 
the half-wave voltage is applied to both electro-optic modulators. 
Projection sideband cooling. Projection sideband cooling has been described in 
detail in a previous paper!®. We give a brief overview here. Projection sideband 
cooling is similar to other sideband-cooling techniques, except that microwave 
photons, which have a very small momentum compared to the optical photons that 
are usually used, drive the vibrational-state-changing transitions. To accomplish 
this, for a given lattice direction, say x, the polarization of one of the lattice beams 
in the pair is rotated slightly, which displaces the traps experienced by atoms in 
the |F = 4, mp = —4) and |F = 3, mp = —3) states slightly relative to each other 
(the same effect that we use to state-selectively translate atoms during our sort- 
ing operations). Therefore, all the vibrational wavefunctions associated with 
one magnetic sublevel have non-zero projections onto the wavefunctions of the 
other sublevel. There are consequently non-zero matrix elements for vibration- 
state-changing microwave transitions. 

Projection cooling proceeds as follows. All the atoms are prepared in 
|F =4, mp = —A4) via optical pumping. The «x lattice-beam polarization is rotated, 
and then a Av, = -2 microwave adiabatic fast passage (AFP) pulse drives the 


transition |F = 4, mp = —4) to |F = 3, mp = —3), followed by a modified polar- 
ization rotation and then a Av, = 1 microwave pulse from |F = 3, mp = —3) 
to |F = 4, mp = —4) (which is a Av = —1 pulse for atoms that start in 


|F = 4, mp = —4)). The AFP pulses work well regardless of the initial vibrational 
state. The two AFP pulses lower 1, by 1 for all the atoms except those initially 
in 4, = 0, which make no transitions. All the atoms for which both AFP pulses 
were successful end in |F = 4, mp = —4), except those that started in v, = 1. The 
polarization is rotated back, and an optical pumping pulse resets all the atoms to 
|F =4, mp = —4). These steps are then repeated for the y and z lattice beams, and 
then the sequence is repeated 50 times. The whole cooling sequence takes about 
1s. The Av = —2 — Av = 1 sequences minimize the number of times that an 
atom has to be optical pumped, which is a particular advantage for cooling atoms 
from high vibrational states. Sequentially stepping through the Cartesian directions 
optimizes the final cooling steps. 

We have improved the performance of our previously demonstrated projec- 
tion sideband cooling method'* considerably, from 76% to 89% occupancy of the 
ground vibrational state. This improvement results from two changes. First, we 
increased the fidelity of the Av = —2 microwave pulse, where Av is the micro- 
wave-driven change in vibrational level, by separately optimizing the lattice 
displacement for Av = —2 and Av = —1. Second, we improved the quality of the 
optical-pumping-light polarization at the atoms by a factor of 5. 
Implementation of a motion step. Extended Data Fig. 1 illustrates our timing 
sequence for one motion step. Before any motion, atoms are optically pumped to 
the |F = 4, mp = —4) state (not shown). The atoms to be moved are transferred 
to the |F = 3, mp = —3) state sequentially. The addressing lasers, directed by 
micro-mechanical electronic systems mirrors, cross at a target atom in the 3D array, 
causing an a.c. Stark shift on its resonance frequency between |F = 4, mp = —4) 
and |F = 3, mp = —3) by —50 kHz with respect to the atoms that are not in the 
path of either addressing laser. The addressing laser powers are ramped up over 
40 \1s, after which we wait for another 110 \1s for our intensity lock to settle. We 
drive the transition in the target atoms with a 3-ms-long AFP microwave pulse, 
which involves a 12-kHz frequency sweep. The crosstalk is less than 3 x 1073. 

To initiate motion, the polarization of one of the lattice beams is linearly rotated 
by 7 over 3 ms by ramping the voltages on the electro-optic modulators. The atoms 
in |F = 3, mp = —3) are then optically pumped to |F = 4, mp = —4) in 0.2 ms (with 
an intensity of 4 mW cm? and detuning of —7.5 MHz on the F = 3 to F’ = 4 
transition, and 0.5 mW cm~? and 7.5 MHz on the F = 4 to F’= 4 transition). The 
voltages are then ramped back to zero. A final optical pumping step over 0.25 ms 
ensures that all atoms are back to |F = 4, mp = —4) for the next motion step. 
Measuring state-flip fidelity. To measure the efficiency of our addressing scheme, 
we take an occupancy map, apply projection sideband cooling to the atoms and 
optically pump them to the |F = 4, mp = —4) state. We then sequentially flip the 
state of all the atoms within a5 x 5 x 5 region to |F = 3, mp = —3) using targeted 
addressing. Then, another laser beam resonant with the transition from |F = 4) 
to |F’ = 5) pushes away the atoms that were left in the |F = 4, mp = —4) state. A 
new occupancy map is then generated to identify the atoms that were successfully 


transferred to |F = 3, mp = —3). Averaging over 50 implementations, we measure 
a state-flip fidelity of 0.986(5). However, the addressing laser beam drifts slowly 
once aligned, which can decrease the state-flip fidelity by about 0.02 after about 
100 sorting operations. 

Measuring motion fidelities. Motion errors can occur when atoms spontaneously 
emit lattice light. An atom is usually lost during motion if light scattering leaves 
it in the anti-trapped state. Occasionally the atom site-hops, if it stays trapped 
but follows the ‘wrong’ lattice potential. We measured the motion fidelities for 
atoms in |F = 4, mp = —4) and |F = 3, mp = —3) separately (see Extended Data 
Fig. 2). Atoms were first projection-sideband-cooled and optically pumped to 
|F = 4, mp = —4). To find the cumulative effect of making 2N motion steps in 
a given direction, we ramped up to the half-wave voltage, Vy,2, of the electro- 
optic modulators, then ramped down to —V),2, and repeated the process N times. 
Because no optical pumping or state flips were applied during the motions, all the 
atoms moved back and forth by one lattice spacing around their initial positions. 
By comparing the occupancy maps before and after these motions steps, we can 
identify the percentage of atoms that successfully return to their initial positions, 
which we call the motion fidelity. For motion in |F = 3, mp = —3), the sequence is 
the same except that after the atoms are optically pumped to |F = 4, mp = —4),a 
global microwave pulse is applied to flip the state of all atoms to |F = 3, mp = —3) 
before executing the motions. Each data point in Extended Data Fig. 2 is aver- 
aged over 10 sorting operations and corrected for the loss due to collisions with 
background gas atoms. A linear fit gives the fidelities per motion step. For atoms 
in the |F = 4, mp = —4) state and motion in the lattice directions, the fidelities are 
{0.9951(6), 0.9982(6), 0.9962(4)}, where the errors refer to one standard deviation. 
The corresponding fidelities for atoms in the |F = 3, mp = —3) state are {0.9956(4), 
0.9961(10), 0.9956(1)}. The calculated probability of spontaneous emission for an 
atom in the vibrational ground state during a motion step is 3.5 x 10~%. 

Sorting algorithm. We have generalized the sorting algorithm for any initial 
N x N x Nlattice and any final i x j x k sublattice. If i = j = N, then only balanc- 
ing and compaction steps are needed. Ifi, j < N, then extra motion steps in x and y 
are added to move as many atoms as possible into an i x j x N sublattice from 
‘outside’ before balancing and compaction (‘outside’ means the full lattice minus 
the target sublattice). For example, to filla 4 x 4 x 3 sublattice, as many atoms as 
possible are first moved into a4 x 4 x 5 region in two motion steps, one in x and 
one in y, from outer y-z and x-z planes of the lattice. Balancing and compaction 
are then applied toa 4 x 4 x 5 lattice rather thana5 x 5 x 5 lattice. The simu- 
lations that we describe below suggest that even though this procedure does not 
always empty the outside planes, there are always enough atoms to filla 4 x 4 x 3 
sublattice when starting from a 50% filled 5 x 5 x 5 lattice. 

The steps for balancing ani x j x N lattice to fillani x j x k sublattice are 
roughly as follows: 

1. If this is the first iteration, choose a dividing plane, P, to be an x-z plane. 
Otherwise, choose the dividing plane to be perpendicular (either x-z or y-z) to 
the previous iteration. Choose P to divide the lattice into two parts, S; and Sp, that 
are as similar as possible (that is, a difference of one plane between S, and S) is 
permitted if the lattice dimension is odd). 

2. If the number of z rows in S; (S2) is m (m), the required number of atoms in S; 
(Sp) isk x n(k x m). Move atoms between the two sublattices until they each have 
at least the required number of atoms. 

3. Repeat these steps for S, and S, separately, stopping when each of them is just 
a single z row. 

Balancing guarantees that there are k atoms in each of the i x j z rows. These 
atoms are then moved in the z direction (‘compactior’) in parallel to fill the desired 
k planes, usually in the middle of the accessible lattice. The algorithm minimizes 
the number of motion steps. 

The sorting algorithm can probably be improved by replacing the initial steps 
to empty the outer x-z and y-z planes by a more optimal algorithm. For instance, 
the first sort could be modified to distribute the extra atoms evenly and thus reduce 
the number of correction steps. 

Monte Carlo simulations. Monte Carlo simulations of this sorting algorithm start 
with a randomly half-filled 3D array. Errors are probabilistically applied at each 
motion step and atom loss is considered after the completion of a sort. We calculate 
a separate motion fidelity for each internal state as the average of the measured 
fidelities in the three directions. One thousand simulations were run for various 
lattice dimensions and various target sublattices. For fillinga5 x 5 x 20r4 x 4 x 3 
sublattice from a half-filled 5 x 5 x 5 lattice, the simulations predict an average 
filling factor of about 0.97 after three sorts, in agreement with our measured filling 
factor to within the uncertainty associated with our measured errors. 

Real-time control. The sorting process requires changing the timing sequence 
in real time. This is accomplished by combining real-time data analysis with two 
field-programmable gate arrays (FPGAs). The experiment has a ‘backbone of a 
fixed timing sequence. After the motion steps have been generated according to 
the initial occupancy map, the FPGAs pause that fixed timing sequence and take 
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control of the electronic channels (optical pumping, electro-optic modulator 
voltages, addressing, microwaves) required for sorting. The data used for sorting, 
which comprise a sequence of directions for the motion steps and the lattice sites to 
be addressed at each motion step, are communicated to the FPGAs by the program 
that generates the occupancy map and creates the sorting plan. The FPGAs convert 
the motion steps into several voltage sequences that are output synchronously. 
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After the motion steps have been executed, the FPGAs transfer the timing control 
back to the fixed backbone, which resumes where it was paused. 

Code availability. The Monte Carlo code used to model our algorithm is available 
from the corresponding author on request. 

Data availability. The underlying data used to generate the figures and conclusions 
in the paper are available from the corresponding author on reasonable request. 
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Extended Data Fig. 1 | Motion step. A motion step to move n atoms spacing. After motion, the atoms are optically pumped so that they all 

is shown. n atoms are sequentially targeted by the addressing beams return to the stationary state. The EO voltages are then ramped back 
and transferred from the ‘stationary’ state to the ‘motion state using down. A final optical pumping (OP) ensures optimal preparation for the 
microwaves. The electro-optic modulator (EO) voltages are ramped up to next motion step. 
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Extended Data Fig. 2 | Motion fidelities. a, b, Measured motion fidelity 
as a function of the number of motion steps in |F = 4, mp = —4) (a) and 
|F = 3, mp = —3) (b) in the x (maroon circles), y (blue squares) and z 
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All-inorganic perovskite nanocrystal scintillators 
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The rising demand for radiation detection materials in many 
applications has led to extensive research on scintillators'>. The 
ability of a scintillator to absorb high-energy (kiloelectronvolt- 
scale) X-ray photons and convert the absorbed energy into low- 
energy visible photons is critical for applications in radiation 
exposure monitoring, security inspection, X-ray astronomy and 
medical radiography**. However, conventional scintillators are 
generally synthesized by crystallization at a high temperature 
and their radioluminescence is difficult to tune across the visible 
spectrum. Here we describe experimental investigations of a series 
of all-inorganic perovskite nanocrystals comprising caesium 
and lead atoms and their response to X-ray irradiation. These 
nanocrystal scintillators exhibit strong X-ray absorption and 
intense radioluminescence at visible wavelengths. Unlike bulk 
inorganic scintillators, these perovskite nanomaterials are solution- 
processable at a relatively low temperature and can generate X-ray- 
induced emissions that are easily tunable across the visible spectrum 
by tailoring the anionic component of colloidal precursors during 
their synthesis. These features allow the fabrication of flexible 
and highly sensitive X-ray detectors with a detection limit of 
13 nanograys per second, which is about 400 times lower than 
typical medical imaging doses. We show that these colour-tunable 
perovskite nanocrystal scintillators can provide a convenient 
visualization tool for X-ray radiography, as the associated image 
can be directly recorded by standard digital cameras. We also 
demonstrate their direct integration with commercial flat-panel 
imagers and their utility in examining electronic circuit boards 
under low-dose X-ray illumination. 

The nature of the atomic constituents of a scintillator plays an 
important role in the radioluminescence process of the material 
because X-ray absorption increases exponentially with atomic num- 
ber®. Although a wide range of scintillation materials containing heavy 
atoms have been characterized in detail for efficient X-ray scintil- 
lation, almost all of these materials are bulk crystals and grown by 
the Czochralski method’ at temperatures above 1,700°C. For bulk- 
form scintillators, such as PhWOx, and BigGe30 2, a certain distance 
of exciton migration is typically needed to transport charge carriers 
for subsequent trapping by luminescence centres®. However, exces- 
sive exciton migration is detrimental because it can cause either 
radioluminescence afterglow or low-efficiency X-ray scintillation. In 
addition, conventional activator-doped scintillators, such as thalli- 
um-activated CsI (CsI:Tl) and cerium-activated YAIO3 (YAIO3:Ce), 
cannot produce tunable scintillation because of their fixed transition 
energies”!°. Despite enormous efforts, the development of scintil- 
lating materials that are low-temperature- and solution-processable, 


highly sensitive to X-rays and integrable to flexible substrates remains 
a daunting challenge. 

Recently, bulk crystals of organic—inorganic hybrid perovskites have 
been found to exhibit large X-ray stopping power’! and the ability 
of efficiently converting X-ray photons into charge carriers'*-!8. The 
direct photon-to-current conversion can be attributed to the heavy Pb 
atom” and large electron-hole diffusion lengths available in organic- 
inorganic hybrid perovskites*”*°. We reason that caesium lead halide 
perovskite nanocrystals”, which feature heavy constituent elements 
and tunable electronic bandgaps in the visible range, could be a prom- 
ising candidate for high-efficiency X-ray scintillation. An appealing 
aspect of these perovskite nanocrystals is that their unique electronic 
structures render highly emissive triplet excited states*” and anomalous 
fast emission rates”®. By virtue of the effect of quantum confinement 
and increased overlap of electron and hole wavefunctions, the spatial 
distribution of luminescence centres and X-ray-generated excitons 
can be confined within the Bohr radius of the nanocrystals. Here we 
report experimental investigations of multicolour X-ray scintillation 
from a series of all-inorganic perovskite nanocrystals and demonstrate 
their use for ultrasensitive X-ray sensing and low-dose digital X-ray 
technology. 

Ina typical bulk scintillator material, incident X-ray photons can 
interact with heavy atoms (for example, Pb, Tl or Ce) to produce a large 
number of hot electrons through the photoelectric effect®. These charge 
carriers are quickly thermalized to form low-energy excitons, which can 
subsequently be transported to defect centres or activators for radia- 
tive luminescence (Extended Data Fig. 1a). We thus predict that high- 
energy (kiloelectronvolt-scale) X-ray photons can be converted to 
numerous low-energy visible photons via direct bandgap emissions in 
lead halide perovskite nanocrystals (Fig. 1a). To validate this hypothesis, 
we prepared a series of perovskite nanocrystals (CsPbX3, with X = Cl, 
Br or I) by controlling the reaction of Cs-oleate with different PbX2 pre- 
cursors via a hot-injection solution method” (Extended Data Fig. 2). 
Transmission electron micrograph imaging reveals a cubic shape of the 
as-synthesized nanocrystals, with an average size of 9.6 nm (Fig. 1b). 
Remarkably, under X-ray beam excitation the perovskite quantum dots 
(QDs) yield narrow and colour-tunable emissions (Fig. 1c, Extended 
Data Fig. 3). This unique property allows multicolour, high-efficiency 
X-ray scintillation to be realized (Fig. 1d, e, Extended Data Table 1). By 
contrast, the radioluminescence spectrum of conventional bulk scintil- 
lators (CsI:T], Ph» WOxg, YAIO3:Ce and BisGe30}) is almost invariable 
and exhibits a wide emission peak with a large full-width at half- 
maximum (Extended Data Fig. 1b). This inherent limitation of conven- 
tional scintillators makes it difficult to achieve multicolour visualization 
of X-ray irradiation. 
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Fig. 1 | Full-colour radioluminescence from perovskite nanocrystal 
scintillators. a, Schematic representation of X-ray-induced luminescence 
of energy hv (where h is the Planck constant and v is the frequency), 
generated in an all-inorganic perovskite lattice with a cubic crystal 
structure (see main text for details). b, Low-resolution transmission 
electron microscopy (TEM) image of the as-synthesized CsPbBr3 
nanocrystals. The inset shows a high-resolution TEM image of a single 
CsPbBr3 nanocrystal and the corresponding electron diffraction pattern 
along the [100] zoom axis. c, Tunable luminescence spectra of the 
perovskite QDs under X-ray illumination with a dose rate of 278 Gy s~' at 
a voltage of 50 kV. The material compositions of samples 1-12 are CsPbC]3 
(1), CsPbCl,Br (2), CsPbCl, sBry.5 (3), CsPbCIBr2 (4), CsPbCl).sBro5 (5), 


Inspired by the bandgap-tunable perovskite nanocrystal scintillators, 
we successfully developed a flexible prototype device for multicolour 
X-ray scintillation through a combination of solution processing and 
soft lithography (Fig. 1f, Extended Data Fig. 3d, e). The fabrication 
of the X-ray-sensing device was made possible by casting the oleate- 
capped perovskite nanocrystals onto the flexible substrate of interest. 
This flexible substrate allowed rapid X-ray multicolour visualiza- 
tion (Supplementary Video 1), which is inaccessible by current bulk 
scintillators. 

We then compared the sensitivity of the perovskite nanocrystals to 
X-ray illumination with that of several of the most widely used com- 
mercial bulk scintillators (CsI:Tl, P»h>WOx4, YAIO3:Ce and BiyGe30)2). 
We used low-dose irradiation of 5.0 1Gy s~! (all doses refer to doses in 
air) at 10 kV and 5,:A and found that the ability of CsPbBr3 nanocrystal 
thin films (thickness of about 0.1 mm) to convert X-ray photons into 
visible luminescence is comparable to that of high-efficiency CsI:T1 
bulk scintillators (thickness of 5.0 mm), whereas it compares much 
more favourably (more intense by a factor of 5 or more) than other bulk 
scintillators, including P»bWO,, YAIO3:Ce and BiyGe30), (Fig. 1d). This 
superior performance is attributed to the large X-ray stopping power 
and high emission quantum yields of the lead halide QDs. Notably, 
conventional QDs (for example, CdTe QDs and carbon dots) exhibit 
low- efficiency X-ray-induced luminescence possibly due to weak X-ray 
absorption”, and thus are not suitable for practical use as scintillators 
(Fig. 1d, Extended Data Fig. 4). Asa point of comparison, we also found 


CsPbBr3 (6), CsPbBryI (7), CsPbBrj gl). (8), CsPbBri sl, 5 (9), CsPbBrj 211.8 
(10), CsPbBrl, (11) and CsPbI; (12). The insets show photographs of 

the thin-film samples 3, 6 and 9, which emit blue, green and red colours, 
respectively, upon X-ray irradiation. d, Comparison of the optical 
sensitivity of various scintillator materials in response to exposure to 
X-rays produced at a voltage of 10 kV. e, CIE (Commission Internationale 
de l'Eclairage) chromaticity coordinates of the X-ray-induced visible 
emissions measured for samples 1-12. f, Multicolour X-ray scintillation 
(left, bright-field imaging; right, X-ray illumination at a voltage of 50 kV) 
from three types of perovskite nanocrystal scintillator (orange, CsPbBr]; 
green, CsPbBr3; blue, CsPbCIBra). 


that typical bulk single crystals of CsPbBr3 and CH3NH3PbBr; do not 
exhibit noticeable visible emission under the same experimental con- 
ditions (Fig. 1d, Extended Data Fig. 5). The noteworthy scintillation 
performance of CsPbBr3 nanocrystals with respect to their bulk coun- 
terparts presents a compelling case for investigating the origins of the 
scintillation process in our system. This process can be explained in 
part by the lack of exciton confinement in the bulk form, in which 
discrete or quantized energy levels that give access to visible emission 
cannot be generated. 

We further investigated experimentally and theoretically the physical 
processes that govern the interaction between X-rays and perovskite 
nanocrystals. As shown in Fig. 2a, we compared the absorption coeffi- 
cient of the CsPbBr; nanocrystals (highest atomic number Zax = 82; 
Ka = 88.0 keV) as a function of X-ray photon energy with two types of 
conventional QD (CdTe, Zmax = 52, Ka=31.8 keV; carbon, Zmax = 6, 
Ka=0.285 keV). The nature of heavy atomic constituents is critically 
important for efficient X-ray scintillation, because X-ray absorption 
scales with the effective atomic number, Zr, as Ze /AE 3, where A is 
the atomic mass and E is the X-ray photon energy®. We thus speculate 
that the Pb-based perovskite nanocrystals are much more suitable for 
efficient X-ray absorption than QDs without the Pb component. We 
carried out an X-ray photoelectron spectroscopic investigation to 
record the kinetic process of electrons escaping from the CsPbBr3 nano- 
crystal upon irradiation with soft X-rays (Fig. 2b). To reveal the photo- 
ionization nature of the X-ray scintillation process under study, we 
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Fig. 2 | Mechanistic investigation of X-ray energy conversion by 
perovskite nanocrystals. a, Measured absorption spectra of CsPbBr3, CdTe 
and carbon as a function of X-ray energy. The attenuation coefficients 

were obtained from ref. °°. b, X-ray photoelectron spectroscopic data 

of the CsPbBr3 nanocrystals plotted against the binding energy of the 
electron. The photoemission peaks Cs 3d, Pb 4fand Br 3d are indicated. 
a.u., arbitrary units. c, Measurement of X-ray-induced luminescence from 
the perovskite nanocrystals using synchrotron radiation. The electronic 
edge energies of Pb L, Cs K and Br K (shown as red squares) fall in the 
X-ray energy range 10-38 keV. The line is a guide for the eye. d, Calculated 
electronic band structures of the CsPbBr3 nanocrystal. The inset shows the 
Brillouin zone of the cubic-phased crystal lattice (see Methods for details). 
e, Proposed mechanism of X-ray scintillation in a lead halide perovskite 
nanocrystal. Upon X-ray irradiation, a high-energy electron (red circles, 
e-) is ejected from a lattice atom through photoelectric ionization (ionizing 


measured the radioluminescence of the perovskite nanocrystals in 
response to synchrotron radiation (Fig. 2c, Extended Data Fig. 6, 
Supplementary Video 2). We observed an abrupt enhancement in the 
scintillation intensity upon excitation at 14 keV, 16 keV and 36 keV, 
indicating an X-ray absorption resonance at the electronic edge of the 
Pb L, Cs K and Br K shells in the CsPbBr3 structure. Density functional 
theory calculations confirmed that the electronic band structure of 
these perovskite nanocrystal scintillators is tunable, which is associated 
with the tailorability of their valence band through control of the halide 
composition (Extended Data Fig. 7). The bandgap energy of the 
perovskite nanocrystal under study is located in the range 1.7-3 eV, 
suggesting the feasibility of using such a nanomaterial to convert an 
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radiation creates an energetic electron and a hole in an inner electronic 

shell). Subsequently, the ejected high-energy electron produces secondary 
high-energy electrons. The generated hot charge carriers then undergo 
thermalization and produce low-energy excitons. Next, fast radiative 
recombination takes place, producing radioluminescence of energy hv in 
either a singlet (S) or triplet (T) state at the electronic band edge. f, Energy 
density on the surface of a CsPbBr; cluster as a function of particle distance, 
d, in the lattice. The red line is a fit with a Gaussian distribution function with 
a fitting coefficient of 7 = 0.9752. The particle distance corresponding to the 
maximum energy density is 10.32 A. g, Schematic showing the basic design 
of a perovskite-nanocrystal-based photoconductor used for X-ray sensing. A 
10-j1m-thick layer of CsPbBr3 QDs is spin-coated onto the substrate for X-ray 
photon-carrier conversion. Gold (Au) electrodes are placed onto the QDs for 
hole-electron extraction. h, Current—voltage characteristics of the 
as-fabricated photoconductor, recorded with and without X-ray illumination. 


absorbed dose of ionizing radiation into visible light (Fig. 2d). In addi- 
tion, the orbital contour plots of the CsPbBr3 nanocrystal indicate that 
the presence of hole-like surface-vacancy-induced Coulomb-trapping 
states near the Fermi level beyond the valence band maximum is 
responsible for the electronically energetic confinement of excitons in 
the perovskite nanocrystal (Extended Data Fig. 8b). 

Figure 2e presents a plausible mechanism for the high-intensity 
radioluminescence from the perovskite nanocrystals. At the initial 
conversion stage, an incident X-ray photon with energy lower than 
a few hundred kiloelectronvolts interacts with the lattice atoms of a 
perovskite nanocrystal, predominantly through the photoelectric effect. 
During this process a large number of high-energy electrons and holes 
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Fig. 3 | Ultrasensitive X-ray sensing and radiography using CsPbBr; 
nanocrystals. a, Radioluminescence measurements for a CsPbBr3- 
based scintillator as a function of dose rate. The left inset shows 
radioluminescence profiles measured at low dose rates. The detection 
limit of 13 nGy s~! is derived from the slope of the fitting line, with 

a signal-to-noise ratio of 3. The right inset shows a schematic of the 
X-ray photodetector, which consists of a CsPbBr3 nanocrystal thin film 
(about 120m thickness), a polydimethylsiloxane (PDMS) layer anda 
photomultiplier tube (PMT). All measurements were performed three 
times. Error bars are mean + s.d. b, Measured radioluminescence decay 


can be created, and electronic transport occurs between the perovskite 
nanocrystals (Fig. 2f). The hot electrons and holes are then quickly 
thermalized in the conduction and valence band edges. The X-ray- 
induced charge carriers in the perovskite nanocrystals were experimen- 
tally confirmed by measuring the current through a photoconductor 
upon X-ray illumination (Fig. 2g, h). The trapping and radiative recom- 
bination of electron-hole pairs can be controlled to produce a desired 
luminescence colour by adjusting the bandgap energy. The mechanism 
of intense X-ray scintillation could be attributed in part to the strong 
X-ray stopping power and quantum confinement effects of perovskite 
nanocrystals. Additionally, the scintillation process is dominated by 
the presence of highly emissive triplet excited states (Fig. 2e), large 
absorption cross-section within the bandgap (Extended Data Fig. 9a, b) 
and fast emission output (Extended Data Fig. 9c-e), which are charac- 
teristics of perovskite nanocrystals?”*?, 

The solution-processability of the perovskite nanocrystals makes it 
possible to fabricate a thin-film scintillator device for ultrasensitive 
X-ray detection. In this device (Fig. 3a), spin-coated CsPbBr3 nano- 
crystals are used for X-ray sensing by converting high-energy X-ray 
photons into visible emission, which is readily detectable by a photo- 
multiplier tube. A favourable characteristic of the prototype X-ray 
detector is its linear response to the X-ray dose rate, covering a range 
as broad as four orders of magnitude (Extended Data Fig. 10). The 
lowest detectable dose rate for X-ray detection is demonstrated to be 
13 nGys_!. This value is about 420 times lower than the dose typically 
used for X-ray diagnostics (5.5 |1Gy s~')"*. This scintillation photo- 
detector also exhibits a very fast response (scintillation decay time, 
7 = 44.6 ns) upon excitation with pulsed photons (661 keV) from a 
portable !3’Cs source (Fig. 3b). The fast response to X-ray photons 
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of the CsPbBrs3-based scintillator under excitation with a !°”Cs source 
(photon energy, 661 keV). The scintillation decay time is T= 44.6 ns. 

c, Photostability of the CsPbBr3-based scintillator against continuous 
X-ray irradiation (wavelength A= 530 nm, 50 kV; top) and repeated cycles 
of X-ray excitation at 30 kV with a time interval of 30 s (A=530 nm; 
bottom). d, Schematic of the experimental setup used for real-time X-ray 
diagnostic imaging of biological samples. A beetle is placed between the 
X-ray source and a scintillation platform covered with perovskite QDs. 

e, f, Bright-field (e) and the X-ray (f) images of the sample, recorded with a 
digital camera. The X-ray images were recorded at a voltage of 50 kV. 


is critical to scintillation performance in medical radiography. The 
photostability of the perovskite nanocrystals was further examined 
under continuous or repeated cycles of X-ray illumination, as shown 
in Fig. 3c. 

To assess the suitability of the perovskite nanocrystals as scintilla- 
tors for X-ray phase-contrast imaging, we implanted a metallic needle 
into a green scarab beetle and imaged the biological specimen with 
X-rays against a background substrate comprising a thin film of 
solution-processed CsPbBr3 nanocrystals (Fig. 3d). We note that the 
CsPbBr; nanocrystals were chosen for this demonstration because their 
green emission at 530 nm matches well with the maximum wavelength 
response of a complementary metal-oxide-semiconductor sensor. As 
shown in Fig. 3e, f, owing to the large difference between the X-ray 
stopping powers of the needle and the beetle, the needle inside the 
beetle is clearly revealed by phase-contrast imaging recorded using a 
common digital camera. The concept of direct X-ray contrast imaging 
through the use of high-efficiency perovskite nanocrystals is readily 
applicable to high-throughput electronics inspection and tissue 
imaging, where common digital cameras can be conveniently used 
(Extended Data Fig. 11; Extended Data Table 2). 

We took a step further and tested the compliance of the perovskite 
nanocrystals to commercial flat-panel X-ray detectors equipped 
with a-Si photodiode arrays (Fig. 4a, b). As shown in Fig. 4c, the 
perovskite-nanocrystal-based X-ray detector shows a modulation 
transfer function of 0.72 at a spatial resolution of 2.0 line pairs per 
millimetre, which is much higher than the spatial resolution of com- 
mercially used CsI:Tl-based flat-panel X-ray detectors (0.36 at 2.0 line 
pairs per millimetre). This high spatial resolution could be ascribed to 
the lower degree of light scattering in the nanoparticle-based thin film 
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compared with that occurring in commercial bulk-scintillator-based 
films made of thick polycrystalline ceramics or long micropillars. We 
further used the prototype device to image the internal structures of 
electronic circuits and an Apple iPhone with a low X-ray dose of 15 wGy 
(Fig. 4d-f). Unlike CsI:TI scintillators, which have the issue of after- 
glow luminescence (scintillation decay time of 1,000 ns), our perovskite 
nanocrystals have a very fast response (44.6 ns) to X-rays, making them 
ideal for dynamic real-time X-ray imaging. 

In conclusion, we have demonstrated inorganic perovskite nano- 
crystals as a new class of scintillators that are capable of converting 
small doses of X-ray photons into multicolour visible light. When 
considering the material’s solution-processability and practical scal- 
ability, it is envisioned that these scintillators are suitable for the mass 
production of ultrasensitive X-ray detectors and large-area, flexible 
X-ray imagers. Compared to conventional CsI:TI scintillators—whose 
use is constrained by the risk of thallium poisoning, the presence of 
afterglow and high-temperature synthesis—perovskite nanocrystals 
offer several outstanding attributes, including relatively low toxicity, 
low-temperature solution synthesis, fast scintillation response and high 
emission quantum yield. Although there is still much to be learned 
regarding the origin of nanocrystal scintillation, these perovskite nano- 
crystals may hold substantial promise for advancing X-ray sensing and 
imaging industry. The thermal and environmental instability issues 
that are often associated with perovskite materials in photovoltaic and 
light-emitting-diode applications could be largely avoided through the 
X-ray scintillation settings. 


Online content 

Any methods, additional references, Nature Research reporting summaries, source 
data, statements of data availability and associated accession codes are available at 
https://doi.org/10.1038/s41586-018-0451-1. 
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function under 15 \1Gy of X-ray exposure. The blue circles and purple line 
show measured values and the black line is a fit to the data. d, e, Digital 
photograph of a network interface card (d) and corresponding X-ray 
image obtained using the flat-panel detector (70 kV and 2.5 mGys~! 
exposure for 6 ms) (e). f, Comparison of X-ray images of an Apple iPhone 
acquired with the perovskite scintillator deposited on an a-Si photodiode 
panel (left) and only with an a-Si photodiode (right). 


Received: 24 October 2017; Accepted: 21 June 2018; 
Published online 27 August 2018. 


an 


Réntgen, W. C. On a new kind of rays. Science 3, 227-231 (1896). 

2. Moretti, F. et al. Radioluminescence sensitization in scintillators and phosphors: 
trap engineering and modeling. J. Phys. Chem. C 118, 9670-9676 (2014). 

3. Buchele, P. et al. X-ray imaging with scintillator-sensitized hybrid organic 
photodetectors. Nat. Photon. 9, 843-848 (2015). 

4. Yaffe, M. J. & Rowlands, J. A. X-ray detectors for digital radiography. Phys. Med. 
Biol. 42, 1-39 (1997). 

5. Durie, B.G. & Salmon, S. E. Scintillator distribution in high-speed 
autoradiography. Science 190, 1093-1095 (1975). 

6. Niki, M. & Yoshikawa, A. Recent R&D trends in inorganic single-crystal 
scintillator materials for radiation detection. Adv. Opt. Mater. 3, 463-481 (2015). 

7. Weber, M. J. Inorganic scintillators: today and tomorrow. J. Lumin. 100, 35-45 
(2002). 

8. Rodnyi, P. A. Physical Processes in Inorganic Scintillators (CRC Press, Boca Raton, 
1997). 

9. Nagarkar, V. V. et al. Structured Csl(TI) scintillators for X-ray imaging 
applications. /EEE Trans. Nucl. Sci. 45, 492-496 (1998). 

0. Baccaro, S. et al. Scintillation properties of YAP:Ce. Nucl. Instrum. Methods 
A 361, 209-215 (1995). 

1. Rowlands, J. A. Material change for X-ray detectors. Nature 550, 47-48 (2017). 

2. Kim, Y. C. et al. Printable organometallic perovskite enables large-area, low-dose 
X-ray imaging. Nature 550, 87-91 (2017). 

3. Wei, W. et al. Monolithic integration of hybrid perovskite single crystals with 
heterogenous substrate for highly sensitive X-ray imaging. Nat. Photon. 11, 
315-321 (2017). 

4. Wei, H. et al. Sensitive X-ray detectors made of methylammonium lead 
tribromide perovskite single crystals. Nat. Photon. 10, 333-339 (2016). 

5. Yakunin, S. et al. Detection of X-ray photons by solution-processed organic- 
inorganic perovskites. Nat. Photon. 9, 444-449 (2015). 

6. Shrestha, S. et al. High-performance direct conversion X-ray detectors based on 
sintered hybrid lead triiodide perovskite wafers. Nat. Photon. 11, 436-440 
(2017). 

7. Wei, H. et al. Dopant compensation in alloyed CH3NH3PbBr3_,Cl, perovskite 
single crystals for gamma-ray spectroscopy. Nat. Mater. 16, 826-833 (2017). 

8. Pan, W. et al. CspAgBiBrg single-crystal X-ray detectors with a low detection 

limit. Nat. Photon. 11, 726-732 (2017). 


© 2018 Springer Nature Limited. All rights reserved. 


19. Birowosuto, M. D. et al. X-ray scintillation in lead halide perovskite crystals. Sci. 
Rep. 6, 37254 (2016). 

20. Dong, Q. et al. Electron-hole diffusion lengths >175 um in solution-grown 
CH3NH3Pbls3 single crystals. Science 347, 967-970 (2015). 

21. Shi, D. et al. Low trap-state density and long carrier diffusion in organolead 
trihalide perovskite single crystals. Science 347, 519-522 (2015). 

22. Tan, H. et al. Efficient and stable solution-processed planar perovskite solar cells 
via contact passivation. Science 355, 722-726 (2017). 

23. Li, X. et al. A vacuum flash-assisted solution process for high-efficiency 
large-area perovskite solar cells. Science 353, 58-62 (2016). 

24. |m, J.-H., Jang, |.-H., Pellet, N., Gratzel, M. & Park, N.-G. Growth of CH3NH3Pbl3 
cuboids with controlled size for high-efficiency perovskite solar cells. Nat. 
Nanotechnol. 9, 927-932 (2014). 

25. Son, D.-Y. et al. Self-formed grain boundary healing layer for highly efficient 
CH3NH3Pbl3 perovskite solar cells. Nat. Energy 1, 16081 (2016). 

26. Protesescu, L. et al. Nanocrystals of cesium lead halide perovskites (CsPbX3, 

X=Cl, Br, and |): novel optoelectronic materials showing bright emission with 

wide color gamut. Nano Lett. 15, 3692-3696 (2015). 

27. Becker, M.A. et al. Bright triplet excitons in caesium lead halide perovskites. 

Nature 553, 189-193 (2018). 

28. Hu, F. et al. Superior optical properties of perovskite nanocrystals as single 

photon emitters. ACS Nano 9, 12410-12416 (2015). 

29. Swarnkar, A. et al. Quantum dot-induced phase stabilization of a-CsPbl3 

perovskite for high-efficiency photovoltaics. Science 354, 92-95 (2016). 

30. Hossu, M., Liu, Z., Yao, M., Ma, L. & Chen, W. X-ray luminescence of CdTe quantum 

dots in LaF3:Ce/CdTe nanocomposites. Appl. Phys. Lett. 100, 013109 (2012). 

31. Saidaminoy, M. I. et al. Pure CsaPbBre: highly luminescent zero-dimensional 

perovskite solids. ACS Energy Lett. 1, 840-845 (2016). 

32. Kovalenko, M. V., Protesescu, L. & Bodnarchuk, M. |. Properties and potential 

optoelectronic applications of lead halide perovskite nanocrystals. Science 358, 

745-750 (2017). 

33. Berger, M. J. et al. XCOM: Photon Cross Sections Database; https://www.nist.gov/ 

pml/xcom-photon-cross-sections-database (2013). 


Acknowledgements This work is supported by the King Abdullah University 
of Science and Technology; the Singapore Ministry of Education (grants 
R143000627112 and R143000642112); the Agency for Science, Technology 
and Research (A*STAR) under contracts 122-PSE-0014 and 1231AFG028 


LETTER 


(Singapore); the National Research Foundation, Prime Minister’s Office, 
Singapore under its Competitive Research Program (CRP award number 
NRF-CRP15-2015-03); the National Basic Research Program of China (973 
Program, grant number 2015CB932200); the National Natural Science 
Foundation of China (21635002, 21471109, 21210001 and 21405143); and 
the Natural Science Foundation of Jiangsu Province (BE2015699). We thank 
H. Jiang, B. Deng, Z. Fang, Z. Zhou, Y. Zhang, X. Ling, M. Sun and A. Malko for 
technical assistance. 


Reviewer information Nature thanks R. Comin, W. Heiss and the other 
anonymous reviewer(s) for their contribution to the peer review of this work. 


Author contributions Q.C. and X.L. conceived and initiated the project. X.L., 

H.Y. and W.H. supervised the project and led the collaboration efforts. Q.C., X.L., 
H.Y. and W.H. designed the experiments. Q.C., J.W., L.L. and S.H. performed the 
nanocrystal synthesis. Q.C., X.0. and J.L. carried out the spectral measurements. 
Q.C., X.0., Y.W., Y.L., D.F.,, Z.Y., D.B.L.T. and A.H.L. contributed to the design 

and implementation of the X-ray sensing experiments. B.H., M.B. and O.F.M. 
carried out the theoretical calculations. J.A., AA.Z. and O.M.B. prepared the 
perovskite single crystals. X.G. and T.W. fabricated the photoconductor devices 
and performed the photocurrent measurements. X.X. fabricated the PDMS 
moulds and measured the low-temperature scintillation spectra. Q.C. and X.L. 
wrote the manuscript. All authors discussed the results and commented on the 
manuscript. 


Competing interests The authors declare no competing interests. 


Additional information 

Extended data is available for this paper at https://doi.org/10.1038/s41586- 
018-0451-1. 

Supplementary information is available for this paper at https://doi.org/ 
10.1038/s41586-018-0451-1. 

Reprints and permissions information is available at http://www.nature.com/ 
reprints. 

Correspondence and requests for materials should be addressed to H.Y. or 
W.H. or X.L. 

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional 
claims in published maps and institutional affiliations. 


6 SEPTEMBER 2018 | VOL 561 | NATURE | 93 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


METHODS 

Chemicals. Caesium carbonate (Cs;CO3, 99.9%), lead(11) chloride (PbCh, 99.99%), 
lead(11) bromide (PbBr2, 99.99%), lead(11) iodide (Pbly, 99.99%), oleylamine 
(technical grade 70%), oleic acid (technical grade 90%), 1-octadecene (technical 
grade 90%) and cyclohexane (chromatography grade 99.9%) were purchased from 
Sigma-Aldrich. Silicon wafers were obtained from Xilika Crystal Polishing Material 
Co., Ltd (Tianjin, China). SU-8 photoresist (2050) and developer solution were 
purchased from Microchem Corp. (Newton, MA). A Sylgard 184 silicone elasto- 
mer kit was purchased from Dow Corning for the preparation of polydimethyl- 
siloxane (PDMS) substrates. Crystals of CsI:TI, BigyGe3012, YAIO3:Ce and P»WO4 
scintillators were purchased from Zhonghelixin Co., Ltd (Chengdu, China). CdTe 
QDs were obtained from Xingzi New Material Technology Development Co., Ltd 
(Shanghai, China). Unless otherwise noted, all the reagents were used without 
additional treatment. 

Synthesis of Cs-oleate as a caesium precursor. In a typical synthesis procedure, 
Cs,CO3 (0.4 g, 1.23 mmol), oleic acid (1.25 ml) and octadecene (15 ml) were added to 
a two-neck round-bottom flask (50 ml). The resulting mixture was heated to 100°C 
under vigorous stirring and vacuum conditions for 0.5 h. After that, a nitrogen 
purge and vacuum were alternately applied to the flask three times to remove 
moisture and O3. Subsequently, the mixture was heated to 150°C and the solution 
became clear, indicating the completion of the reaction between Cs,CO; and oleic 
acid. The Cs-precursor solution was kept at 150°C in a nitrogen atmosphere before 
the synthesis of perovskite nanocrystal. 

Synthesis of CsPbX; (X= Cl, Br or I) nanocrystals. The CsPbX3 perovskite QDs 
were synthesized by a modified hot-injection procedure”. In a typical experi- 
ment, PbX; (0.36 mmol each for X = Cl, Br or I), oleic acid (1.0 ml), oleylamine 
(1.0 ml) and octadecene (10 ml) were added to a two-neck round-bottom flask 
(50 ml). The resulting mixture was heated to 100°C under vigorous stirring and 
vacuum conditions for 0.5 h, at which time the moisture residue was removed 
by purging with nitrogen and vacuum suction. Then the mixture was heated to 
160°C until the PbX, precursors dissolved completely. A hot Cs-oleate precursor 
solution (1 ml) was injected quickly into the above reaction mixture. After 5 s of 
reaction, the flask was transferred into an ice bath. The CsPbX3 QDs were obtained 
by centrifugation at 13,000 r.p.m. for 10 min and stored in 4 ml of cyclohexane 
before further use. Mixed-halide perovskite QDs were synthesized to tune the 
luminescence colour. Samples 1-12 correspond to the as-synthesized CsPbC]; (1), 
CsPbClI,Br (2), CsPbCl, sBr.5 (3), CsPbCIBr2 (4), CsPbCl, sBro,s (5), CsPbBr3 (6), 
CsPbBroI (7), CsPbBr; 81.2 (8), CsPbBri sI;.5 (9), CsPbBri2]).8 (10), CsPbBrl, ql 1) 
and CsPbI; (12) QDs. 

Growth of lead halide perovskite single crystals. The growth of CsPbBr; single 
crystals was carried out according to a method described in the literature™. In a 
typical procedure, CsBr (0.64 g, 3 mmol) and PbBr, (2.2 g, 6 mmol) were dissolved 
in 3 ml of dimethyl sulfoxide and stirred for 1 h. Subsequently, 1.5 ml of the mix- 
ture was transferred into a vial heated at 60°C, and the temperature was raised to 
100°C with a heating rate of 10°C h~!. At 100°C, the solution was filtered and 
then gradually heated to 120°C. We observed the growth of small-sized crystals 
with increasing temperature. The resulting crystals were washed with hot dimethyl 
sulfoxide and dried in vacuo at 100°C for 1 h. For the synthesis of CH3NH3PbBr3 
(MAPbBrs) single crystals, a mixture of PbBr; and CH3NH3Br (1.5 mol each) was 
dissolved in a solution of N,N,-dimethyl formamide (1 ml) at room temperature. 
The solution was purified by passing through a polytetrafluoroethylene filter with 
a pore size of 0.22 1m. The growth of the MAPbBr; single crystals was carried out 
in an oil bath heated at 60°C and under ambient pressure. 

Synthesis of fluorescent carbon dots. Fluorescent carbon dots were synthesized 
by a hydrothermal method. In a typical experiment, ammonium citrate (0.972 g, 
4.0 mmol) was first dissolved in a 20-ml water solution. The solution was then 
transferred into a 30-ml Teflon-lined vessel at room temperature while stirring. 
Subsequently, the solution was heated to 190°C and kept at that temperature for 
10 h. After cooling to room temperature, the product was purified using dialysis, 
and the cut-off molecular weight of the dialysed membrane was equivalent to 
about 2,000. The carbon dot solution was concentrated using a rotary evaporator. 
Preparation of silica-coated perovskite nanocrystals. The stability of perovskite 
nanocrystals was improved by coating with a silicon dioxide (silica) layer according 
to the literature**. In a typical procedure, the CsPbBr3 QDs, dispersed in cyclohexane 
(2 ml), were introduced into a 50-ml flask containing 10 ml of toluene solution 
(99.5%; AnalaR NORMAPUR). 100 il of tetramethoxysilane was injected quickly 
into the mixture at room temperature. After stirring for 2 h, the products were 
isolated through centrifugation at 13,000 rp.m. for 8 min. The silica-coated CsPbX3 
(X= Br or I) QDs were either dispersed in cyclohexane or dried in air. 

Physical characterization. Powder X-ray diffraction characterization was carried 
out by an ADDS wide-angle X-ray powder diffractometer with Cu Ka radiation 
(wavelength, \= 1.54184 A). TEM imaging was performed using a FEI Tecnai G20 
transmission electron microscope with an accelerating voltage of 200 kV. X-ray 
photoelectron spectroscopy analysis was carried out using a Thermo escalab 


250Xi instrument equipped with Al Ka monochromatized X-rays at 1,486.6 eV. 
Absorption spectra were measured by an ultraviolet—visible spectrophotometer 
(UV-2450, Shimadzu, Japan). Photoluminescence and radioluminescence spectra 
were obtained by an Edinburgh FS5 fluorescence spectrophotometer (Edinburgh 
Instruments Ltd, UK) equipped with a miniature X-ray source (AMPEK, Inc.). 
Photographs of the X-ray-induced luminescence were acquired with a digital 
camera (Nikon D7100 with AF Micro-Nikkor 60mm f/2.8D). For the time- 
resolved photoluminescence measurements, a pulsed excitation source was used. 
The scintillation decay measurement was carried out at the Institute of High Energy 
Physics of the Chinese Academy of Sciences with a !°”Cs source used for the pulsed 
excitation. The effective scintillation decay time (Tefr) can be calculated using the 
following formula: 


1 oe) 
t= f I(t)dt 
Ih Yo 


where I(t) and Ip denote the radioluminescence (or photoluminescence) intensity 
as a function of time, f, and the maximum intensity, respectively. 

Measurement of photoluminescence quantum yield. The quantum yield was 
determined with an optical spectrometer equipped with an integrating sphere. 
Perovskite QDs were dispersed in cyclohexane. The excitation and luminescence 
emission were detected by a photomultiplier tube (PMT) through total internal 
reflection in the integration sphere. The photoluminescence quantum yield (PLQY) 
was calculated according to PLOY = Psample/ (Sret — Ssample)» Where Syer and Ssample 
are the excitation light intensities not absorbed by the solvent and the sample, 
respectively, and Psample is the integrated emission intensity of the sample (Extended 
Data Fig. 5e). 

Measurement of exciton binding energy in perovskite nanocrystal scintillator. 
The exciton binding energy (E,) was estimated by measuring the temperature- 
dependent radioluminescence intensity. By fitting data derived from the integrated 
luminescence intensity of the CsPbBr3 QD scintillator with the Arrhenius formula: 


I(T) 
1+ CTexp[ — E,/(kgT)] 


I(T)= 


where I(To) is the radioluminescence intensity at the low-temperature (To) limit, 
kg is the Boltzmann constant, C is a constant and T is the temperature, we obtain 
an exciton binding energy of 49 meV. 

Fabrication of perovskite nanocrystal scintillator films on PDMS substrates. 
The PDMS substrates were fabricated by a standard soft lithography microfabri- 
cation technique. Briefly, a photomask was first designed using Adobe Illustrator 
CS6. A 60-|1m-thick layer of negative photoresists (SU-8 2015; 2,500 r.p.m., 60 s) 
was spin-coated onto a silicon wafer (3 inch; 1 inch = 1.54 cm). The wafer was 
prebaked at 60°C for 10 min and then at 85°C for 5 min. The resulting photoresist 
on the wafer was irradiated by an ultraviolet lamp for 20 s, followed by a post- 
baking treatment in an oven at 75°C for 5 min. Next, the desired microstructure on 
the silicon wafer was produced using a developer solution. The PDMS substrates 
were fabricated with a premixed PDMS prepolymer and curing agent (10:1 by 
mass) under vacuum conditions, followed by heat treatment at 80°C for 2 h. The 
PDMS replicas were carefully peeled off from the master. Finally, perovskite QDs 
dispersed in cyclohexane were coated onto the PDMS substrate. 
Radioluminescence measurement for perovskite nanocrystal scintillators. The 
measurement of X-ray-induced luminescence was performed using a solid film 
comprising perovskite QDs. We note that perovskite QDs dispersed in solution 
are not suitable for scintillation characterization under X-ray excitation, because a 
low population of QDs in solution is inefficient for X-ray absorption. Unlike under 
visible-light excitation, a quartz cuvette is not used for measuring scintillation 
luminescence under X-ray excitation, because the excitation can be strongly 
absorbed by the cuvette. The scintillation decay times*® of CsI:Tl, BigGe30 12, 
YAIO3:Ce and PbWO, crystal scintillators are listed in Extended Data Table 1. 
X-ray photoconductor devices. To fabricate the X-ray photoconducting device, 
silica wafers with a 300-nm-thick SiO, layer were first cleaned by sonication in 
acetone, ethanol and deionized water separately. After drying with flowing nitro- 
gen, the substrates were treated with oxygen plasma for 6 min. The solution of 
CsPbBr3 QDs was spin-coated onto the Si/SiO substrates at 500 r.p.m. for 30s and 
subsequently annealed at 100°C for 5 min. This procedure was repeated three times 
to produce a film with a thickness of about 101m. After that, 100-nm-thick gold 
electrodes were deposited onto the CsPbBr3 QD film by thermal evaporation, using 
a shadow mask to control the size of the deposition. For the X-ray photon-to-cur- 
rent measurement, we used a commercially available, miniaturized X-ray tube 
(Amptek). The target in the X-ray tube was made of gold and the maximum output 
was 4 W. In our measurement, the X-ray tube voltage was kept at 50 kV while the 
peak X-ray energy was set at 10 keV with an Al/W filter and a 2-mm-diameter brass 
collimator. The distance between the X-ray source and the X-ray photoconducting 
device was about 30 cm. The current-voltage measurement of the devices was 
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performed using a Signotone Micromanipulator S-1160 probe station equipped 
with a Keithley 4200 Semiconductor Parametric Analyzer. All the experiments 
were carried out at ambient conditions. 

X-ray scintillation detector and imaging. The X-ray scintillation detector was 
constructed by coating of a PDMS substrate with perovskite QDs (layer thickness 
of 120,1m), followed by the attachment of a PMT. Ina typical procedure, a solution 
of CsPbBr3 QDs was spin-coated onto the PDMS substrate. The PDMS substrate 
was then coupled to the PMT for maximized collection of visible photons. For 
X-ray detection, a range of X-ray dose rates (0.013-278 Gy s~') was applied by 
adjusting the current and voltage of the X-ray source. For X-ray imaging, a plastic 
disk coated with CsPbBr3 nanocrystals was used. A green scarab beetle implanted 
with a metallic needle was employed as a specimen for X-ray imaging. 

In vivo multicolour optical bioimaging. All the animal experiments were per- 
formed in compliance with institutional guidelines. Silica-coated perovskite QDs 
(CsPbBr3, CsPbBry 5]1.5, CsPbBrj 211g; 100 j1g, 5011) dispersed in a phosphate- 
buffered saline buffer solution were subcutaneously injected into the Balb/c 
nude mice (age, 4-6 weeks; weight, 18 g). An animal imaging system (Advanced 
Molecular Imager, Cold Spring Biotech Corp., Shanghai) equipped with an X-ray 
source was used to carry out in vivo radioluminescence imaging of the mice. The 
exposure time for in vivo imaging was set at 1 s. For in vivo multicolour optical 
imaging, optical filters (530 nm, 630 nm and 670 nm) were used to selectively 
record the X-ray-induced luminescence at different emission wavelengths 
(Extended Data Fig. 11). 

Construction of perovskite-based flat-panel X-ray imaging system. The a-Si 
photodiode assay backplane was customized for commercial «-Si/CsI:T1 detectors 
supplied by iRAY Technology Shanghai, Inc. The active area of a photodiode array 
is 43.0 cm x 43.0 cm, consisting of 3,072 x 3,072 square pixels with a pixel pitch of 
139m. CsPbBr3 nanocrystals were first dispersed in cyclohexane. We coated the 
photodiode arrays (8.0 cm x 8.0 cm) with a thin film (thickness, 751m) of nano- 
crystals using a solution-processing method. After evaporation of cyclohexane, 
an aluminium film (401m thick) was added under vacuum, in a packaging process 
similar to that used in commercial CsI:Tl-based X-ray imaging systems. The alu- 
minium film was used to protect the scintillators from moisture and light soaking. 
We note that a reflecting layer was coated on the surface of the aluminium film to 
enhance the light collection into the photodiode elements. The power consump- 
tion was 25 W for full-image acquisition and the X-ray source was operated at a 
voltage of 70 kV. X-ray imaging of electronic circuit boards was acquired with an 
X-ray exposure of 2.5 mGy s! for 6 ms, resulting in a dose of 15 1Gy. The spatial 
resolution was determined by measurement of the modulation transfer function. 
Radioluminescence analysis using synchrotron radiation. The characterization 
of the yield of X-ray-induced luminescence near electronic-shell edges was con- 
ducted using the synchrotron beamline in the Shanghai Synchrotron Radiation 
Facility. A thin film of CsPbBr3 nanocrystals was cast onto a PDMS substrate. 
The X-ray excitation energies were 10-38 keV, and a portable spectrophotometer 
(Ocean Optics) was used to measure the radioluminescence. 

Density functional theory calculation. For the calculation of the projected partial 
density of states (PDOS), density functional theory (DFT) calculations were carried 
out. We used the Cambridge Serial Total Energy Package (CASTEP) source code 
to perform the calculations with the rotation-invariant DFT+U method. Ina typ- 
ical procedure, a simple cubic phase with Pm3m symmetrical lattice arrangement 
was modelled for bulk-phase CsPbBr3. Norm-conserving pseudopotentials of the 
Cs, Pb and Br atoms were generated by the OPIUM code in the Kleinman- 
Bylander projector form. A nonlinear partial core correction and a scalar relativ- 
istic averaging scheme were used to treat the spin-orbit coupling effect. In 
particular, we treated the 4s, 4p and 4d states of the Br atoms as valence states, the 
5s, 5p and 5d states for Cs atoms, and the 5d, 6s and 6p states for Pb atoms. The 
Rappe-Rabe-Kaxiras—Joannopoulos method was chosen to optimize the pseudo- 
potentials during electronic minimization, particularly using a blocked-Davidson- 
scheme matrix diagonalization. 

For the calculations of the electronic states in the CsPbBr; material, we used 
self-consistent determination for on-site U correction on the localized p orbitals 
of Br sites to correct the on-site Coulomb energy of the electron spurious self- 
energy. The on-site electronic self-energy and related wavefunction relaxation 
in the semicore p, d or f orbitals in mixed-valence elements were used to obtain 
accurate orbital eigenvalues for the electronic structures and transition levels. An 
ab initio two-way crossover searching calculation was performed by two func- 
tionally compiled CASTEP-17 source codes. Using the self-consistent determina- 
tion, on-site Hubbard U parameters for different orbitals of Br and Pb sites were 
obtained. Further, a time-dependent DFT calculation was performed with a two- 
electron-based Tamm -Dancoff approximation imported from the self-consistently 
corrected ground-state wavefunctions. 

Luminescence mechanisms in perovskite nanocrystal scintillator. Two energy- 
transfer mechanisms for recombination luminescence exist in the perovskite 
nanocrystal scintillator. One is anisotropic electron and hole transport within the 
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reciprocal Brillouin zone, which leads to a difference between the electron effective 
mass along different paths and the excitonic binding energy. This difference illus- 
trates the probability that electronic transport within reciprocal band structures is 
directionally selected for luminescence. Another plausible route is the annihilation 
of shallow acceptor levels (Pb vacancies), which induces an absence of recombina- 
tion centres. Such intrinsic lattice defects usually produce low-excitation energy 
levels compared with the ideal lattice and consequently hinder energy transfer 
during the light-absorption process. 

Anisotropic transport-induced luminescence contrast. Using DFT calculations, the 
bandgap of a bulk CsPbBr; crystal was calculated to be about 2.02 eV, whereas in 
the CsPbBr3 QDs the bandgap increases slightly to 2.22 eV (Fig. 2d). This is because 
the large surface-to-volume ratio in the CsPbBr3 nanocrystal induces an evident 
quantum confinement, thus leading to an enlarged vacuum Coulomb barrier for 
electronic transitions. 

We chose reciprocal, highly symmetrical points and lined up two different paths 
(X—R-—M-—T-3R and T—R-—M) within the Brillouin zone. As shown in the 
electronic band structure plot in Fig. 2d, the valence band edge and the conduc- 
tion band edge are located at the same point R(1/2, 1/2, 1/2). Using effective mass 
theory, the effective mass for electrons and holes was found to be anisotropic along 
these two directions. In the directions X-R—-M—T'—R and [—R—M, the effec- 
tive masses for electrons were calculated to be 0.037 and 0.11 and the effective 
masses for holes were calculated to be 0.12mp and 0.24mp, respectively, where mo 
is the rest mass of the electron. Thus, the Wannier—Mott exciton binding energy 
and radius are different in these two directions. 

By converting the reciprocal Brillouin zone area into a real-space diagram, 
we found that the Cs site at (0, 0, 0) in the body-centred area is different in 
these two paths. Point R(1/2, 1/2, 1/2) denotes the position of the Pb site, whereas 
M(1/2, 1/2, 0) represents the location of the Br site. Owing to the different effective 
electron masses of the two paths, the path [—R—M is energetically favourable to 
the transport of electrons and holes. By contrast, the X+R—M direction is ruled 
out because the binding energies are too large to release electrons and holes for 
recombination. This implies a charge transfer process from the Cs site to the Pb site 
at the cubic apex point, finally reaching the Br site at the middle point of the cubic 
edge, namely, through the path [—R—M (Extended Data Fig. 7a). 

Furthermore, we used an orbital calculation to retrieve the electronic and hole 

orbitals from the electronic band structure. Our results show that bound electrons 
stay at the Br sites at a non-bonding state in the p-7 orbital level (Extended Data 
Fig. 7b). Meanwhile, bound holes were found to stay at the Pb site with an s-orbital 
spherical distribution (Extended Data Fig. 7c). The orbital contour plots reveal the 
localization of electrons and holes at perfect lattices. The stabilized charge state of 
the body-centred Cs site is Cst because electrons are transferred from the Pb site 
to the Br site through the ionization of one s-state electron. 
Intrinsic lattice defects in perovskite nanocrystal scintillators. Intrinsic lattice defects 
in perovskite nanocrystal scintillators are responsible for both luminescence and 
the quenching effect. Here we consider the low-energy native defects of a Br 
vacancy (Vp,) and a Pb vacancy (Vpp). For Vg, the absence of one Br atom leaves 
one electron occupying the empty p orbitals of the nearest neighbouring Pb site. 
Accordingly, localized electronic orbitals were modelled for Vp, in the neutral (V¥,,) 
and singly positive (V}.) states (Extended Data Fig. 7d, e). Because the charge 
bound at the nearby Pb site is positively ionized, the p electronic orbitals of the V}, 
site show a transition from the correlated state to a repulsive behaviour between 
the two neighbouring Pb sites. The PDOS analysis also shows that the electronic 
level of Vg; which is localized at the bottom of the conduction band edge, serves 
as a shallow donor. The V»p» lattice defects produce an acceptor trap centre and a 
spin-polarized state (Extended Data Fig. 7f, g). The singly negative state of a Pb 
site (V p,) with one electron already captured could partially passivate the acceptor 
trap site with weakened charge localization. The process of local geometrical relax- 
ation on different charge states of Vp, indicates that the Cs* sites centrosymmet- 
rically move towards the Vpp centre (Extended Data Fig. 7h). Upon the occurrence 
of a strong clustering effect to form Vp, near the Vp site, the electronically active 
acceptor trap centre can be completely terminated. In this case, formation of local 
Cs-Br motifs is possible. We also considered Vc; and V¢, sites in the lattice and 
found no effects on the electronic properties of the host lattice. Accordingly, Vp, 
and Vp» produce the electronic and hole levels for luminescence recombination in 
the form of photon emission (Extended Data Fig. 7i). 

Further, we calculated the excited energy and thermodynamic transition levels 
in the CsPbBr3 QDs. From the internal bulk lattice to the surface region, the dimen- 
sions of the material decrease but its surface-to-volume radio is increased. The 
electronic donor trap levels experience a transition from a localized state below 
the conduction band edge to a delocalized state in the conduction band (Extended 
Data Fig. 7j). During the process of radiation ionization and release, the number of 
bound electrons is increased accordingly. By contrast, with a decreased dimension 
of the host lattice, the trapping ability of the acceptor is decreased where the hole 
level shifts from a delocalized state in the valence band to a localized state above 
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the valence band. Therefore, the quenching effect in the bulk CsPbBr3 materials 
for luminescence recombination is caused by annihilation of a hole level that is 
deeply buried in the valence band. The structural transition from the QD to the 
bulk form occurs from the surface to the bulk, thus the hole level is annihilated. 

Intrinsic quantum confinement in CsPbBr3 nanocrystals. The intrinsic effect of 
quantum confinement in CsPbBr3 nanocrystals was examined by additional the- 
oretical study of their surface electronic properties. In a typical procedure, we 
first built a simplified model of the CsPbBr3 structure composed of 293 atoms 
with a particle size of 12.06 A, namely, a lattice group (6 x 6 x 6) truncated from 
bulk CsPbBrs; crystal, using the radial coordinated structural formation program 
(RCSEP) (Extended Data Fig. 8a). DFT calculations yield that the orbital contour 
plots of the CsPbBr3 QD show the highest occupied molecular orbital (HOMO), 
the lowest unoccupied molecular orbital (LUMO) and surface-vacancy-induced 
Coulomb trapping (SVIC) states (Extended Data Fig. 8b). The electronic struc- 
tures show that the SVIC state is formed owing to unsaturated p orbitals of sur- 
face Br sites and is electronically localized at the apex corner regions of the QD. 
Additionally, the PDOS of the CsPbBr3 QD indicates that such SVIC sites are 
mainly distributed near the Fermi level, beyond the valence band maximum, thus 
exhibiting a hole-like feature and being strongly confined by the LUMO orbitals 
(Extended Data Fig. 8c). This leads to the suppression of the long-distance 
transport of electron-hole pairs across the particle surface or between particles. 
Furthermore, a model was built to perform a simulation of the energetic evolution 


on the surface of the QDs as a function of particle distance (Extended Data Fig. 8d). 
To investigate surface confinement, we further calculated the relative energy level 
of the SVIC state as a function of particle distance (Extended Data Fig. 8d). It is 
obvious that at a distance of 12.06 A from the particle surface, the SVIC energy 
implies a strong hole-like confinement, merely 0.056 eV above the valence band 
maximum. Our results suggest that the mean confinement path of electronic trans- 
port within the lattice is approximately 10.32 A (Fig. 2f). Indeed, the intrinsic 
energetics on the surface of the QDs is reasonable for energy confinement of the 
thermalized low-energy excitons inside the nanocrystal, resulting in a high yield 
of X-ray scintillation light. 

Data availability. The data that support the findings of this study are available 
from the corresponding authors upon reasonable request. 
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Extended Data Fig. 1 | Schematic representation of scintillation frequency during excitonic luminescence; va, photon frequency during 
mechanism and X-ray-induced luminescence in bulk inorganic activator (A) luminescence. b, Scintillation properties of CsPbBr3 QDs 
scintillators. a, In bulk inorganic materials, X-ray photons absorbed by and commercial bulk inorganic materials under X-ray excitation. The 
the lattice atoms can generate hot charge carriers, followed by exciton full-width at half-maximum (FWHM) represents the spectral width of 
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Extended Data Fig. 2 | Physical characterization of as-synthesized 
perovskite QDs. a, TEM images of the as-prepared cubic-phased 
nanocrystals (left) and the corresponding size distribution of the 
nanocrystals (right). The samples are CsPbCl3, CsPb(Cl/Br)3, CsPbBr3, 
CsPb(Br/I)3 and CsPbI; nanocrystals (from top to bottom). Insets are 
images of perovskite nanocrystals dispersed in cyclohexane, recorded 
under 365-nm ultraviolet light excitation. b, Powder X-ray diffraction 
patterns for typical ternary and mixed-halide CsPbX; (X= Cl, Br or I) 
nanocrystals. All peaks are indexed in accordance with the cubic-phased 
CsPbBr; structure (Joint Committee on Powder Diffraction Standards file 
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(PDF) number 54-0752). ¢, d, Dark-field scanning transmission 

electron micrographs (STEM; JEM-F200HR) of CsPbBr3 nanocrystals. 

e, Elemental mapping of Br, Pb and Cs for the nanocrystals, obtained 
from the area marked by the rectangular box in d using energy-dispersive 
X-ray spectroscopy. f, Atomically resolved dark-field STEM image 

of a single CsPbBr3 nanocrystal, showing Cs and Pb lattice atoms. 

g, Energy-dispersive X-ray spectrum of the as-prepared CsPbBr; 
perovskite nanocrystals, confirming the stoichiometric composition of the 
CsPbBr; nanocrystals. We note that strong Cu signals come from the TEM 
copper grid. 
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Extended Data Fig. 3 | Multicolour-emitting perovskite QD scintillators — 278 \.Gy s~!.c, Typical photographs of radioluminescence from 

upon X-ray irradiation. a, Demonstration of X-ray-induced luminescence | CsPb(Cl/Br)3, CsPbBr3 and CsPb(Br/I)3 QDs under X-ray excitation. 
modulation using CsPbX; QDs of different compositions (X = Cl, Br or I). d, e, Multicolour visualization of as-developed perovskite QD scintillators 
b, Multicolour X-ray scintillation from CsPb(Cl/Br)3, CsPbBr3 and using X-rays (d) and the corresponding bright-field image (e). 
CsPb(Br/I)3 nanocrystals cast on a PDMS substrate. The X-ray dose rate is 
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Extended Data Fig. 5 | Comparison of X-ray-induced luminescence 
for lead halide perovskite materials. ac, Radioluminescence spectra of 
bulk single-crystal CH;NH3PbBr; (a) and CsPbBr; (b) and of CsPbBr3 
nanocrystals (c) under X-ray excitation at 5.0\uGy s_' and 278 Gy s_'. 
The insets are photographs of CsPbBr; bulk single crystal (b) and 
nanocrystal powders (c) taken under ambient light (top) and X-ray 
illumination (bottom). The X-ray dose used for the experiments was 


278 Gy s_'. d, Comparison of radioluminescence intensity for three types 
of perovskite material under 278 Gy s~’ X-ray excitation. e, PLQY and 
exciton binding energy of perovskite materials at 300 K*!*’. We note that 
the thermal energy at 300 K is kgT +25 meV. Nanocrystalline perovskites 
are highly luminescent materials, whereas bulk perovskite crystals are 
most suitable for the generation of free charge carriers owing to their low 
exciton binding energy. 
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Extended Data Fig. 6 | Measurement of exciton binding energy and the CsPbBr; nanocrystals at 532 nm. c, Experimental setup for the 
synchrotron-radiation-induced radioluminescence of CsPbBr3 synchrotron-radiation-induced radioluminescence measurements at the 
nanocrystals. a, Temperature-dependent scintillation spectra of the X-ray beamline of the Shanghai Synchrotron Radiation Facility (SSRF). 
CsPbBr; nanocrystals at 77-300 K under X-ray illumination at 50,Gys~'. The X-ray energy is 10-38 keV. d, The electronic edge energies for Pb L, Cs 
b, Arrhenius plot of the X-ray-induced luminescence intensities of Kand Br kK. 
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Extended Data Fig. 7 | Electronic structure and scintillation mechanism 
of the CsPbBr; nanocrystal along selected reciprocal high-symmetry 
points within the Brillouin zone. a, The Brillouin zone of the cubic- 
phased crystal lattice in CsPbBr3, calculated using relativistic corrections. 
T’, M, Rand X denote high-symmetry points within the reciprocal space 
(blue). b, c, Calculated electron density associated with the valence band 
(b) and the conduction band (c) of the cubic CsPbBr3. We note that the 
halide ions contribute to the changes of the bandgap in the perovskite 


nanocrystal through the influence of the valence orbitals. Cs, purple; Pb, 
grey; Br, orange; electron orbital, blue; hole orbital, green. d-g, Localized 
electronic and hole levels for Vp, and Vp, at different charge states. 

h, i, Schematic diagram of energy transfer for radioluminescence induced 
by intrinsic lattice defects and the quenching effect caused by evident ion 
movement (Cs). V3, denotes a Pb vacancy with small (|5| < 1) charge 
transfer. j, Thermodynamic transition levels of the perovskite nanocrystal. 
E,, maximum valence band energy; E,, minimum conduction band energy. 
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properties of CsPbBr; nanocrystals. a, Simplified model of CsPbBr; (VB) maximum as a function of inter-particle distance, d. The red line is 
structure composed of 293 atoms with a particle size of 12.06 A. calculated by fitting with the Gaussian distribution function. The top inset 
b, Calculated orbital contour plots of a CsPbBr3 nanocrystal, showing shows the simulation model used to calculate the energy evolution of the 
the HOMO (blue), LUMO (green) and SVIC trapping (red) states. The CsPbBr3 nanocrystal as a function of particle distance. CB, conduction 
SVIC states are formed owing to unsaturated p orbitals of surface Br sites. band. 


c, PDOS of the CsPbBr3 QD. The SVIC state is located near the Fermi 
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Extended Data Fig. 9 | Characterization of absorption cross-section and 


transient luminescence spectra of the CsPbBr;3 nanocrystals. 


a, Absorption spectra of CsPbBr; nanocrystals dispersed in cyclohexane 
at different concentrations. b, Absorption as a function of concentration 
for CsPbBr; nanocrystals. The molar extinction coefficient, ¢, was 
determined by applying the Beer-Lambert law A = ecL, where A is the 
absorbance, c is the molar concentration (mol 17!) and L is the optical 
path length (1 cm) through the sample. The absorption cross-section, 


o, was determined by ¢ = Ngo/(1,000 x In10), where Na is Avogadro's 
number. c, Photoluminescence (PL) lifetime of a single CsPbBr3 
perovskite nanocrystal. d, Second-order correlation function, £(7), of 
the nanocrystal. The value g*(0) = 0.16 confirms the single-quantum- 
emitter nature of the photon emission. e, Fluorescence intermittency 
trace recorded for a single CsPbBr; perovskite nanocrystal. The recorded 
photoluminescence intensity reaches more than 2,000 counts per 

20-ms bin. 
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spectra recorded at 10, 20, 30, 40 and 50 kV. d, Kinetic measurement of 
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Extended Data Fig. 11 | Direct X-ray imaging and multiplexed 
labelling for in vivo optical imaging using perovskite nanocrystal 
scintillators. a—d, A flexible flat cable (a) and needle-implanted pork 
tissue (b) were imaged with bright-field and X-ray imaging (c, d). 

We note that the CsPbBr3 nanocrystal scintillator platform shown in 
Fig. 3d was used for the X-ray phase contrast imaging. In both cases, 
the X-ray images clearly reveal the presence of metallic wires embedded 
in the cable and pork tissue. e, Synthesis of CsPbBr3/SiO, core-shell 


nanoparticles with a hydrophobic surface for protection against moisture. 


RT, room temperature. f, TEM image of the as-prepared CsPbBr3/SiO2 


X-ray imaging 


nanoparticles. g, Multicolour luminescence spectra of the perovskite 
nanocrystals under X-ray irradiation at a voltage of 50 kV. The materials’ 
compositions are CsPbBr3, CsPbBry sI).5 and CsPbBr, 21, for green (G), 
orange (O) and red (R) emissions, respectively. We note that the risk 

of lead toxicity must be considered during experimentation. h, Bright- 
field and multicolour luminescent in vivo imaging in mice under X-ray 
excitation at a voltage of 50 kV. The X-ray-induced luminescence was 
recorded by a charge-coupled-device camera equipped with three optical 
filters at 530 nm, 630 nm and 670 nm. 
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Extended Data Table 1 | Scintillation characteristics for different materials 
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Extended Data Table 2 | Properties of perovskite nanocrystals and bulk crystals used for X-ray detection 


Chemical . Electric Lumines. Multicolor Detection Imaging 
Material type ete big Blea 1 Remarks 
component response response scintillation limit(uGyairs |) method 
CsPbX3 HichI Photon-to- 
nanocrystals Scintillator Low capctius Yes 0.013 photon This work? 
(X=Cl, Br, 1) conversion 
MAPbI; Not Photon-to- Nature 
on Semiconductor Good No ~1000 current Photon. 2015, 
thin film reported 9. 444 
conversion ; : 
MAPbBr3 : Not Nat. Photon. 
single crystal Semiconductor Excellent reported No 0.5 - 2016, 10, 333. 
MABDE Ts Photon-to- Nature 
single Not 


Semiconductor Excellent No 0.036 current Photon. 2017, 


Ed oraie ied conversion 11, 315. 
Nature 
MARDI Semiconductor Excellent Not No 48 . Photon. 2017, 
microcrystal reported 
11, 436. 
: Nature 
Cs2AgBiBre : Not 
single crystal Semiconductor Excellent reported No 0.0597 - Photon. 2017, 
11, 726. 
mea Semiconductor Excellent Not No - ie Nature 2017, 
polycrystalline reported ucalien 550, 87. 
conversion 
Photon-to- 
MAPbBr. 
Scintillator - Yes - - photon This work” 
nanocrystals conversion 


Data are from this work and from refs 12-1618, MA, CH3NH3. 

@We note that scintillators are a special class of luminescence materials that have been most widely used for radiation detection by converting high-energy X-ray photons into visible light—a sensing 
process that is different in semiconductor materials, where the dominant process is photon-to-current conversion. The as-developed perovskite nanocrystal scintillators are solution-processable and 
can respond to X-rays with multicolour output, which can be readily recorded by a common digital camera. 

>We note that the organic-inorganic hybrid QDs are less photostable than their purely inorganic counterparts and their synthesis is relatively difficult to scale up owing to the need for stringent control 
over the reaction conditions. 
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A Brownian quasi-crystal of pre-assembled 


colloidal Penrose tiles 


Po-Yuan Wang!” & Thomas G. Mason!?* 


Penrose’s pentagonal P2 quasi-crystal!~ is a beautiful, hierarchically 
organized multiscale structure in which kite- and dart-shaped tiles 
are arranged into local motifs, such as pentagonal stars, which are 
in turn arranged into various close-packed superstructural patterns 
that become increasingly complex at larger length scales. Although 
certain types of quasi-periodic structure have been observed in hard 
and soft matter, such structures are difficult to engineer, especially 
over large areas, because generating the necessary, highly specific 
interactions between constituent building blocks is challenging. 
Previously reported soft-matter quasi-crystals of dendrimers’, 
triblock copolymers’, nanoparticles’ and polymeric micelles* have 
been limited to 12- or 18-fold symmetries. Because routes for self- 
assembling complex colloidal building blocks”-!! into low-defect 
dynamic superstructures remain limited”, alternative methods, 
such as using optical and directed assembly, are being explored!*"*. 
Holographic laser tweezers!> and optical standing waves!® 
have been used to hold microspheres in local quasi-crystalline 
arrangements, and magnetic microspheres of two different sizes 
have been assembled into local five-fold-symmetric quasi-crystalline 
arrangements in two dimensions!’. But a Penrose quasi-crystal of 
mobile colloidal tiles has hitherto not been fabricated over large 
areas. Here we report such a quasi-crystal in two dimensions, created 
using a highly parallelizable method of lithographic printing and 
subsequent release of pre-assembled kite- and dart-shaped tiles into 
a solution-dispersion containing a depletion agent. After release, 
the positions and orientations of the tiles within the quasi-crystal 
can fluctuate, and these tiles undergo random, Brownian motion in 
the monolayer owing to frequent collisions between neighbouring 
tiles, even after the system reaches equilibrium. Using optical 
microscopy, we study both the equilibrium fluctuations of the 
system at high tile densities and also the ‘melting’ of the pattern as 
the tile density is lowered. At high tile densities we find signatures of 
a five-fold pentatic liquid quasi-crystalline phase, analogous to a six- 
fold hexatic liquid crystal. Our fabrication approach is applicable to 
tiles of different sizes and shapes, and with different initial positions 
and orientations, enabling the creation of two-dimensional quasi- 
crystalline systems (and other systems that possess multiscale 
complexity at high tile densities) beyond those of current self- or 
directed-assembly methods'*°. We anticipate that our approach 
for generating lithographically pre-assembled monolayers could 
be extended to create three-dimensional Brownian systems of 
fluctuating particles with custom-designed shapes through 
holographic lithography” or stereolithography”’. 

To make lithographically pre-assembled monolayers (litho-PAMs; 
see Methods), we combine computer-aided design software and lithog- 
raphy to fabricate, position and orient many shape-designed colloidal 
particles in a desired complex initial configuration. After lithographic 
printing using an optical stepper and development, we obtain a 
pre-assembled static set of discrete prismatic polymeric particles, each 
approximately 2 1m thick and composed of cross-linked epoxy SU-8 
photoresist, attached to a thin (approximately 10 nm) layer ofa release 


material (Omnicoat) on a smooth glass wafer (Fig. 1a, Extended Data 
Fig. 1). The particles (or ‘tiles’) are shaped like kites (convex quad- 
rilateral prisms) or darts (non-convex quadrilateral prisms) and are 
arranged in a Penrose P2 quasi-crystalline pattern, as confirmed by 
optical and scanning electron micrographs (Fig. 1b), at a tile area frac- 
tion (ratio of tile area to total monolayer area) of d,4 0.78. We enclose 
the printed region in polydimethylsiloxane (PDMS) elastomeric 
walls and add an aqueous release solution-dispersion (RSD) that we 
custom-formulated to maintain an intact monolayer of fluctuating 
tiles that have nearly hard in-plane interactions (that is, the tiles in the 
monolayer are non-interacting except upon collisional contact, when 
they repel impulsively, ensuring that tile overlap is forbidden). The 
RSD contains a base (tetramethylammonium hydroxide) to dissolve 
the Omnicoat, sodium dodecyl sulfate surfactant to prevent the SU-8 
tiles that are released from aggregating or sticking to the glass substrate, 
and a depletion agent (such as anionic polystyrene nanospheres) to 
create anisotropic roughness-controlled depletion attractions”*”> that 
maintain the monolayer (see Methods and Supplementary Methods 
for details). 

After adding the RSD to the PDMS well (Fig. 1c, Extended Data 
Fig. 1), we image the tiles as they release using an inverted optical 
microscope. The previously static tiles begin to undergo Brownian 
motion, colliding frequently with neighbouring tiles as the system 
equilibrates at fixed d, (Fig. 1d). Time-lapse digital microphotogra- 
phy yields high-resolution videos of the release kinetics (Supplementary 
Video 1). By digitally subtracting the initial image before release from 
images at later times t after adding the RSD, we determine the fraction 
of tiles that are released, P,(t) (Fig. le). The observed release profile 
is consistent with first-order reaction kinetics: P,(f)=1 — exp(—t/7), 
where T~ 1.6 his the characteristic timescale of release (Fig. le). Thus, 
more than 99% of tiles are released after approximately 8 h, and all tiles 
are released after about 20 h. Although the depletion agent preserves 
the monolayer, very rarely, a strong local Brownian fluctuation can 
expel a tile vertically, so a few point defects can be seen for t > 72 h. 
Therefore, by t~ 48 h, many in-plane interparticle collisions have 
occurred but there are very few defects, so we consider the monolayer 
to have effectively equilibrated. Despite ongoing Brownian excitations, 
a 0.78 is large enough that different superstructures of kite-dart 
motifs, although transiently distorted as tiles explore accessible micro- 
states, remain intact long after release over large length scales (Extended 
Data Fig. 2) and the confined P2 quasi-crystal of mobile tiles does not 
melt into a disordered liquid-like state. 

Brownian fluctuations of mobile P2 tiles could alter the degree of 
quasi-crystalline order compared to static tiles, so we take Fourier 
transforms of optical micrographs to produce the equivalent of 
light-scattering intensity patterns (see Methods for details). The nearly 
perfect quasi-crystalline order before release (Fig. 2a) yields Bragg-like 
peaks at low scattering wavenumbers q, indicative of ordering of larger 
superstructures of tiles, and ten narrow rays that extend from the centre 
towards high q, indicative of the high degree of alignment of the edges 
of all P2 tiles along only specific axes in the plane (Fig. 2b). Applying 
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Fig. 1 | Creating a fluctuating Brownian quasi-crystal of mobile 
Penrose kite and dart tiles in a confined monolayer. a, After designing 
and fabricating a mask that contains the desired arrangement and area 
fraction @, of tiles, ultraviolet stepper lithography is used to print 
cross-linked polymeric Penrose tile particles in a P2 quasi-crystalline 
pattern by cross-linking negative SU-8 photoresist at a high a inside a 
pentagonal boundary with an inner edge length of 4.5 mm. Development 
removes unexposed SU-8 between the tiles, which are attached rigidly 

to a 10-nm-thick release layer of water-soluble Omnicoat on a glass 
wafer, thereby enabling observation from below using an inverted 
bright-field transmission optical microscope. Inset, scanning electron 
microscope (SEM) image of a kite particle; scale bar, 5 ,1m. All tiles are 
21m thick. b, Micrographs of pre-configured kite and dart tiles in an 
ideal quasi-crystalline pattern at , ~ 0.78, after development. Left, 
optical microscope image; right, SEM image; scale bar, 101m. c, Solid 
elastomeric PDMS walls are fabricated to enclose the printed pattern and 
are attached to the glass. An aqueous RSD is loaded into this PDMS well 
and a glass coverslip is placed on top to inhibit evaporation. The basic 


terminology originally developed for systems of anisotropic molecules 
rather than colloidal tiles, the narrowness of these rays at high q implies 
that the lithographically pre-assembled P2 tiles have a high degree of 
long-range molecular orientational order, where here ‘molecular ori- 
entations’ are understood to mean ‘tile orientations. Expanded views 
at lower q and very low q (Fig. 2c, d) reveal sets of ten Bragg peaks, 
demonstrating long-range pentagonal quasi-crystalline ordering of 
motifs and superstructures of motifs. 

After release and equilibration, at any given instant the mobile kite 
and dart tiles are no longer in an ideal P2 tiling (Fig. 2e). As a result, 
Brownian fluctuations cause the ten rays at high q to broaden azimuth- 
ally (Fig. 2f). At intermediate q, Bragg peaks have disappeared and 
instead ten-fold modulations in ring-like intensity patterns are seen 
(Fig. 2g), which are reminiscent of the six-fold modulations seen in 
hexatic liquid-crystal systems”*”’, Interestingly, large superstructures 
of motifs retain considerable spatial and orientational order: peaks, 
although broadened, are still observed at the very lowest q (Fig. 2h). 


nature of this RSD dissolves the release layer, the surfactant (anionic 
dodecyl sulfate) adsorbs onto the released tiles and prevents aggregation 
through screened-charge repulsions, and the nanoparticle dispersion 

of polystyrene spheres (40-nm diameter, sulfate-stabilized) produces a 
roughness-controlled depletion attraction between the faces of the tiles 
and the glass substrate that prevents the released particles from leaving the 
monolayer, yet keeps their in-plane interactions nearly hard. Inset, SEM 
image of a dart particle; scale bar, 5m. d, Optical micrograph taken 48 h 
after adding the RSD. The right half of the image has been colour-coded 
(blue, kites; red, darts) using post-acquisition digital analysis; scale bar, 
10m. e, Released fraction of mobile particles P, as a function of time 

t. The solid red line is a fit to the data with the function 1 — exp(—t/r), 
assuming first-order reaction kinetics, which yields a release time constant 
of T= 5,680 s. Inset, schematic side views before and after release. The 
release layer (green) is dissolved, and the released tiles exhibit Brownian 
fluctuations in a fully submerged monolayer just above the glass surface, 
which is negatively charged to ensure that a lubricating layer of water is 
maintained between the tiles and the glass. 


Because entropic-thermal line broadening is very evident at high q, 
we fit the azimuthal intensity I(7)) of a ray using a Gaussian function 
(Fig. 2i; see Methods for details). The peak I of this Gaussian is reduced 
by a factor of roughly seven and the width y), is increased by a factor of 
about two relative to the pre-release intensity profile. At intermediate 
q, we fit I(7)) before release with a double Gaussian function to capture 
the Bragg-like peak; we fit I(7)) after release using a single Gaussian 
function (Fig. 2j; see Methods for details). This disappearance of Bragg 
peaks, yet preservation of a ten-fold modulation in ~ at intermediate 
q long after release indicates that motifs of mobile tiles are no longer 
exactly spatially ordered, but that the majority of motifs do preserve 
a substantial degree of orientational order. Similar broadening effects 
after release are also evident in Fourier transforms of digitally modified 
images showing only kite tiles (Extended Data Fig. 3) or only dart tiles 
(Extended Data Fig. 4). These Fourier transforms long after release 
provide evidence of the existence of a new phase of matter, which we 
call a ‘pentatic liquid quasi-crystal. 


6 SEPTEMBER 2018 | VOL 561 | NATURE | 95 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


<7 
ASG 


beenaeesettes Wawro 


FT PNG RT DH 
Py RG lav SA AAT 


Bras Ly av 
BSS ALDI 
wate DO EAS 1S 


Before release 
yy 


After release 
———— 


1 (10°) 
1 (10") 


Fig. 2 | Entropic restructuring of ordered Penrose kite and dart tiles 
into a fluctuating liquid quasi-crystal monolayer after release. 

a, Optical micrograph of Penrose quasi-crystal tiles before release 
(interiors of tiles are filled black to enhance contrast); scale bar, 201m. 

b, Effective scattering pattern provided by the Fourier transform intensity 
of a, showing ten rays extending from the centre to high scattering 
wavenumbers q. (q is a radial distance measured outward from the centre 
of the Fourier transform, where q = 0; the white scale bar defines the 
magnitude of q in the Fourier transform.) c, Central region of b, magnified 
by a factor of about six, revealing Bragg peaks at intermediate and low q. 
d, Central region of c, magnified by a factor of about two, revealing Bragg 
peaks at very low q associated with superstructural ordering of motifs of 
tiles over large length scales. e, Optical micrograph of fluctuating Penrose 
quasi-crystal tiles 48 h after release (tile interiors filled black). Entropic 
Brownian fluctuations destroy the ideal quasi-crystalline order; the tiles 
no longer have perfect positions and orientations on a quasi-crystalline 


Equilibrium Brownian forces cause motifs—such as the pentago- 
nal flower composed of 15 kite tiles and five dart tiles (Fig. 3a, top; 
Supplementary Video 2) and the pentagonal wheel composed of ten kite 
tiles and five dart tiles (Fig. 3a, bottom; Supplementary Videos 3, 4)— 
to fluctuate and distort randomly, breaking chiral symmetry locally 
without melting”’. Within the fluctuating P2 system, we observe that 
five kites in pentagonal star-like motifs (PSKMs) can rotate collectively, 
and the entire motif makes rotational transitions between different 
preferred angles, defined by corrugations in proximate boundary tiles 
(Fig. 3b, top; Supplementary Video 5). We measure the heterogene- 
ous dynamics”? of these collective rotational fluctuations by digitally 
tracking the rotational angle a and the trajectory of the centroid of one 
kite in the PSKM over time (Fig. 3b, bottom; Fig. 3c). By contrast, five 
darts in pentagonal star-like motifs (PSDMs), which have much more 
corrugated exteriors, fluctuate and distort, which leads to bounded 
Brownian motion that is anisotropic and reflects the local symmetry 
of the fluctuating quasi-crystal (Methods, Supplementary Video 4, 
Extended Data Fig. 5). Although all motifs distort, and PSKMs also 
rotate, larger-scale superstructures made up of multiple motifs appear 
to preserve long-range orientational order. To quantify this, we create a 
motif superstructural orientational pair-correlation function gyso-pcr 
that depends on the separation r between the centres of two similar 
superstructures of motifs (Supplementary Methods). For instance, we 
define a superstructure of five PSKMs in a nearly regular pentagonal 
configuration that surrounds a central PSKM (Fig. 3d); superstructural 
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lattice, but the overall morphology of the quasi-crystal is preserved. 

f, Average Fourier transform intensities of three different micrographs, 
taken 46 h, 48 h (e) and 50 h after release. g, h, Close-ups of f, using 

the same magnification factors as for c and d, respectively. All but the 
superstructural peaks at the very lowest q have become smeared out, 
leaving only ten-fold azimuthal intensity modulations at intermediate 

and high q. i, High-q azimuthal line shapes I(y) of emanating rays before 
release (black circles, left inset) and after release (blue triangles, right 
inset). In the insets, the azimuthal angle w is defined between white solid 
lines and the green dashed lines define the annular sectors, inside white 
dashed lines at high q, used to calculate I(y) over the range of 7) displayed. 
Solid lines in the main panel are fits using Gaussian functions (see text). 

j, Intermediate-q I(7) (insets as in i) before release (black circles) and after 
release (blue triangles). Black line, fit using a double Gaussian; blue line, 
fit using a Gaussian (see text). k, Intensity scale for all Fourier transforms 
shown. 


bond angles (Fig. 3d, defined by dashed arrows) are then correlated in 
a manner analogous to bond orientational pair-correlation functions 
of individual particles. Before release guso-pcr(r) = 1, whereas after 
release guso-pcr(r) decays exponentially to a plateau that is just below 
unity, indicating strong preservation of relative superstructural bond 
angles over long distances (Fig. 3e). Therefore, motif superstructural 
orientational order is preserved over long times and distances even as 
distortions and rotations of smaller constituent motifs occur. Moreover, 
because the P2 quasi-crystal of mobile tiles exhibits solid-like behav- 
iour, similar to simpler dispersed systems of colloidal hard spheres at 
high densities*”, we observe entropically generated sound wavelets that 
scatter locally and are damped (Supplementary Video 6). 

We cause dense P2 quasi-crystals to melt by removing a confining 
wall. Tiles diffuse gradually into the available empty space over time, 
thereby creating a gradient in d, (Fig. 4a—c, Supplementary Video 7). 
Digitally colour-coded PSKMs (blue) and PSDMs (red) progressively 
melt. We measure 4 as a function of distance d at different t (Fig. 4d), 
and the area fractions of intact PSKMs 4 xm (Fig. 4e) and PSDMs 
@a,p (Extended Data Fig. 6). Fermi functions, consistent with diffu- 
sive melting, fit the measured ¢a(d), @axm(d) and ¢,4 pm profiles (solid 
lines in Fig. 4d, e, Extended Data Fig. 6; see Methods for details and 
Extended Data Table 1 for fit parameters). Fourier transforms of the 
same region before, during and after the passage of the melting front 
(Fig. 4f-h) reveal the progressive disappearance of spatial and orien- 
tational order. Peaks at very low q, which were evident before melting 
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t (104 s) 
Fig. 3 | Motif dynamics and superstructural orientational pair- 
correlation function. a, Optical micrographs of fluctuating pentagonal 
motifs of Penrose kite and dart tiles before release (0 h), and 12 h, 24h, 
36 h and 48 h after release. Top, flower motif consisting of 15 kite tiles and 
five dart tiles; bottom, wheel motif consisting of ten kite tiles and five dart 
tiles. Scale bar; 101m. Entropic fluctuations lead to local chiral-symmetry 
breaking of motifs. b, Collective rotational fluctuations of a central 
pentagonal star of kites (black outlines) within a flower. One particular 
kite is coloured to show the collective rotation of the entire motif over time 
t (top); its rotational angle a versus t (bottom) exhibits hopping behaviour 
between preferred angles (plateaus separated by about 36°) that match 
corrugations of the surrounding tiles. Scale bar, 101m. c, Trajectory of 
the centroid of the coloured kite in b, which displays local heterogeneous 
dynamics. Scale bar, 31m. d, One pentagonal superstructure of PSKMs 
(kites filled blue, black dot at centre) is separated by a centre-to-centre 
distance r from a second pentagonal superstructure of PSKMs (kites filled 
blue, red dot at centre). Solid green arrow, vector between superstructure 


(Fig. 4f), disappear, and only isotropic rings, indicative of liquid-like 
disorder, appear in the Fourier transforms immediately after melting 
(Fig. 4h). To determine the value of ¢@, associated with melting, we 
eliminate d and plot @a xm versus ¢a at different t (Fig. 4i). For longer 
t, towards equilibrium, the measured ¢a xm(@a) curves almost over- 
lap; so, by considering ¢,4xm — 0, we find that the P2 quasi-crystal 
effectively melts at a mett ¥ 0.65 +0.02 (here and elsewhere, the values 
quoted are mean + s.d.). This melting and the diffusion of individual 
tiles into the empty space provides additional empirical evidence that 
any residual in-plane depletion attractions between tiles are weak com- 
pared to thermal energy kgT, where kg is Boltzmann’s constant and T 
is the temperature (Supplementary Information). 

The litho-PAM method that we have developed to make a fluctuating 
P2 quasi-crystal can readily be generalized to provide diverse, complex 
and essentially defect-free experimental soft-matter systems that are 
suitable for testing predictions of theories and simulations, including 
fluctuating dynamics and kinetics. Specifically, litho-PAMs can be 
used to reveal and measure localized and anisotropic translational and 
rotational dynamics associated with different particle shapes or local 
groups of shapes, some of which can move collectively, over a wide 
range of densities and length scales. These observations could motivate 
improvements in simulations, which sometimes do not report results 
in units of real time that can be compared directly with experiments. 
Detailed statistical analyses of measured trajectories of tile collisions 
at different relative positions and orientations in experiments could 
potentially be compared to simulations to deduce and quantify inter- 
actions between neighbouring tiles, such as hydrodynamic interactions 
and site-specific attractive or repulsive interactions. 
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centres; dashed green arrows, vectors between superstructure centres and 
PSKM centres. Scale bar, 201m. e, Motif superstructural orientational 
pair-correlation function guso-pcr as a function of r at different times t 
after release: black circles, 24 h; blue squares, 48 h; red diamonds, 72 h; 
green triangles, 96 h. Before release, guso-pcr = | for all r. After release, 
random Brownian excitations of the system cause slight fluctuations in 
the orientations of superstructures of motifs. Error bars, s.d. for r > 0, 
&Mso-pcr = 1 exactly by definition for r= 0); statistical uncertainties 
largely overlap for t > 48 h. Solid red line, fit to the average of the data 
at 48 h and 72 h using exponential decay of gaso-pcr from 1 to a plateau 
of gmso-pcr = 0.96 at large r. The high and persistent value of guso-pcr 
near but below unity for large r over long times indicates that long-range 
orientational order of superstructures of motifs persists in equilibrium 
after release, even as perfect tile-tile spatial order is destroyed over long 
distances as a consequence of the entropic Brownian fluctuations of the 
tiles. 


Extending theories of multi-body systems of single colloidal shapes 
to multiscale systems containing two or more different tile shapes that 
have hard or nearly hard in-plane interactions, such as the system that 
we have demonstrated, represents an exciting challenge. For such fluc- 
tuating monolayers of hard tiles, the free energy is entirely entropic in 
origin, which is considerably different from the free energy of strongly 
bonded configurations of atoms in classic solid-state quasi-crystals* 
(see Methods section ‘Additional discussion’). Ultimately, theories of 
entropy-dominated fluctuating monolayers of complex shapes could be 
used to predict the phase behaviour, dynamic heterogeneity and melt- 
ing of the soft-matter P2 quasi-crystal and other multiscale structures 
made using litho-PAMs. 

Although we have demonstrated one viable route for creating litho- 
PAMs using optical stepper lithography and a specifically formulated 
RSD that creates anisotropic roughness-controlled depletion attrac- 
tions, the general concept of fluctuating litho-PAMs is not limited to 
these specific methods of fabrication, these materials or this particular 
type of interaction. Other methods of fabrication, such as forms of 
nanolithography (see Supplementary Discussion), could also be used. 
The tiles do not need to be slab-like, but instead could have more com- 
plex, out-of-plane shapes, and their overall lateral dimensions could 
be much smaller. Beyond cross-linked polymeric photoresist, the tiles 
could alternatively be composed of inorganic materials or metals. 
Likewise, other release materials and agents could be used. Although 
anisotropic roughness-controlled depletion attractions provide a con- 
venient combination of strong out-of-plane ‘lubricated’ attractions 
between the faces of the tiles and the substrate (so the tiles do not 
stick rigidly to the substrate but instead remain attractively bound to 
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Fig. 4 | Entropic melting dynamics of an unconfined fluctuating quasi- 
crystal. a-c, Colour-coded optical micrographs of Penrose kite and dart 
tiles that melt diffusively into an unconfined open region beyond the right 
edge of the micrograph, at different times after release: a, 4 h; b, 24 h; 

c, 48 h. Blue, PSKMs; red, PSDMs. d, Area fraction @, as a function of 
distance d measured from the left of the micrographs in a-c, at different 
times after release: black circles, 4 h; orange diamonds, 24 h; purple 
triangles, 48 h. Lines are fits using an empirical Fermi-like function 
(Methods). e, Area fraction of unmelted PSKMs ¢,,xm as a function of 

d at different times after release (symbols and lines as in d). f, Fourier 
transform of the image in a inside the yellow dashed square. Ten-fold 
azimuthal intensity modulations are evident. g, Fourier transform of 


it while maintaining a layer of liquid in between) and only very weak 
residual in-plane attractions, their use is not an inherent aspect of litho- 
PAMs. Other types of anisotropic attraction that can provide lubricated 
attractions between the tiles and the substrate, but do not cause strong 
in-plane attractions compared to the thermal energy that would lead 
to the aggregation of tiles, could also be used. The sizes and shapes of 
the boundary walls can also be controlled lithographically, opening 
the door to visualizing entropic and steric effects in specially confined 
systems of shape-designed Brownian particles. 

As the desired level of complexity in multiscale materials increases, 
some limitations of previous self- and directed-assembly methods have 
become apparent. The litho-PAM method described here represents 
an alternative approach that can provide access to a wide range of two- 
dimensional multiscale materials composed of differently shaped 
mobile tiles, including fluctuating Brownian systems that have unusual 
symmetries and hierarchical structures. Thus, litho-PAMs can be used 
to create and study new equilibrium phases of such systems, including 
the fluctuating P2 quasi-crystal that we have demonstrated here. These 
new phases could potentially display unusual spatio-temporal dynamics 
at different length scales, such as the transient collective hopping motion 
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the image in b inside the yellow dashed square. The tenfold azimuthal 
intensity modulations are less pronounced. h, Fourier transform of the 
image in c inside the yellow dashed square. The tenfold azimuthal intensity 
modulations are no longer visible, and the ring-like pattern indicates that 
the tiles in the region are largely disordered, as in a glass or liquid. i, To 
determine the melting area fraction $a meit of the P2 quasi-crystal, we 
plot 4 xm as a function of ¢, at various times after release: green circles, 
24 h; orange diamonds, 48 h; purple triangles, 60 h. Error bars, s.d. At the 
two longest times, we find nearly the same melting behaviour, yielding 
an intercept with the horizontal axis of $4 mei ¥ 0.65. Lines, linear fits 
(see Methods). 


of the PSKMs in our Brownian P2 quasi-crystal. Using litho-PAMs to 
systematically control the degree of coupling between different types 
of motif and the rest of the system is likely to lead to a broader under- 
standing of heterogeneous dynamics in complex multiscale Brownian 
systems, beyond disordered glassy systems. Moreover, the litho-PAM 
method opens up possibilities for exploring and directly visualizing 
the evolution of multiscale systems that have pre-assembled out-of- 
equilibrium initial states, which would otherwise be very difficult to 
create (see Methods section ‘Additional discussion’). Although our 
demonstration of litho-PAMs has focused on a quasi-crystal system 
consisting of only two tile shapes, a much larger number of different 
tile shapes could be simultaneously fabricated and pre-configured using 
litho-PAMs, yielding systems with even higher levels of multiscale 
morphological complexity, including sub-particle features in tile 
shapes, local multi-tile motifs and intricate and diverse superstructures. 


Online content 

Any Methods, including any statements of data availability and Nature Research report- 
ing summaries, along with any additional references and Source Data files, are available 
in the online version of the paper at https://doi.org/10.1038/s41586-018-0464-9. 
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METHODS 


Mask design. Microscale kite-shaped and dart-shaped prismatic tiles are fabricated 
out of epoxy-based negative photoresist SU-8 (Microchem), and these tiles are 
spatially and orientationally organized before release using the following top-down 
photolithography process. A quartz-chrome lithographic mask (150 mm x 150 mm) 
containing a pattern of kite and dart tiles, based on Penrose’s P2 quasi-crystalline 
pattern, is designed at a desired tile area fraction a des. Each kite is a convex 
quadrilateral, with two adjacent sides of designed shorter length C4es, two adjacent 
sides of designed longer length D4es, a 144° interior angle between the two shorter 
sides and other interior angles of 72° (Fig. 1a, inset). The ratio of the two side 
lengths Daes/Caes is the golden ratio (1.618...). Each dart is a concave quadrilateral 
with side lengths Cj,, and Dj,, that are nearly the same as Ces and Djes, and inte- 
rior angles of 216° (concave vertex), 72° (convex vertex opposite the concave 
vertex) and two of 36° (convex vertices at the two sharp tips) (Fig. 1c, inset). A P2 
quasi-crystalline pattern containing about 10° kite and dart tiles is digitally 
designed using lithographic layout software (L-Edit, Tanner Research). An ideal- 
ized P2 pattern is generated using thin lines, and these lines are subsequently 
thickened so that the tiles outside these lines have the desired area fraction, in our 
case a des 0.78. Then, we determine the coordinates of the vertices of the kite 
and dart tiles just inside these thickened lines. We use Caes = 31.3 pm, 
Daes=50.7 jum, Cj, =29.4j1m and Dj, =47.6 1m in the quartz-chrome masks; 
after five-times-reduction stepper optical lithographic printing, these mask dimen- 
sions lead to ideal edge lengths of C+ 6.2\1m and D~ 10.1 1m for the prismatic 
SU-8 kite particles, and of C’ + 5.8m and D!~ 9.5m for the prismatic SU-8 dart 
particles. The overall lateral dimensions of the SU-8 tiles were chosen to ensure 
that diffraction during the UV printing process did not cause the tiles to become 
irreversibly bonded together via unwanted cross-linking in the spaces between 
them when printed at high densities. The separation between the printed SU-8 
tiles at db, 0.78 is uniformly about 1.2m, which is about four times the mini- 
mum feature size that can be printed by the particular UV stepper that we used. 
Thus, although our pre-assembled pattern of kite and dart tiles is inspired by the 
classic P2 construction that involves infinitesimally thin lines, it is distinguishably 
different from it because our variation enables us to set a aes and to determine the 
vertices of all constituent tiles in a well-defined manner. To confine the mobile kite 
and dart particles and to maintain a fixed da after release, we designed millime- 
tre-scale pentagonal-shaped confining walls. Each confining wall has a lateral 
thickness of about 100,1m and an inner edge length of about 2 mm, making contact 
with the outermost confined Penrose tiles. This confining boundary is also made 
of SU-8 photoresist and is printed together with the particles in the same exposure 
step. 

Lithographic production. To observe the kite and dart particles before, during 
and after release, we use a transparent glass wafer as a substrate and record in 
situ time-lapse images using bright-field transmission microscopy. A clean glass 
wafer (100-mm diameter, 500 1m thick, borosilicate) is pre-baked on a hot plate 
at 200°C for 3 min to remove adsorbed moisture from its surface and then cooled 
to 25°C. Although other smooth substrates can be used (such as silicon wafers), 
we chose a glass wafer to facilitate visualization of the monolayer using an inverted 
optical microscope and because glass becomes negatively charged in the aqueous 
basic RSD used in a subsequent step to release the tiles (consequently, anionically 
stabilized tiles will not stick to the glass surface). Onto this glass wafer we spin- 
coat (Headway Research, PWM32 Spinner; 500 r.p.m. for 5 s at an acceleration of 
100 r.p.m. s~}; 3,000 r.p.m. for 30 s at an acceleration of 300 r.p.m. s~!) a release 
layer of Omnicoat (Microchem), yielding an optically transparent thin layer with a 
thickness of about 10 nm after baking at 200°C for 1 min and then cooling to 25°C. 
Subsequently, we spin-coat a layer of SU-8-2002 (Microchem) negative photoresist 
on top of the Omnicoat layer and then bake at 95°C for 90 s, yielding an optically 
transparent, un-cross-linked solid layer of SU-8 that is approximately 2 1m thick 
after cooling to 25°C. This coated wafer is exposed to patterned UV light (365 nm, 
typical energy dose of 180 mJ cm~?) through the designed photomask (Digidat, 
150 mm x 150 mm x 6.25 mm, chrome on quartz) in a lithographic stepper 
(ASML, PAS 5500/200, five times reduction, Hg i-line). 

Our printing routine on the stepper yields 14 identical dies containing kite and 
dart particles within confining boundaries; these dies are evenly distributed on 
the wafer and separated by about 18 mm. The patterned UV light produced by the 
stepper triggers cross-linking of the oligomeric epoxy SU-8 photoresist, and the 
wafer is baked post-exposure at 95°C for 75 s to enhance the rate of cross-linking 
reactions. After cooling to 25°C, the post-exposure-baked wafer is submerged 
in organic SU-8 developer (1-methoxy-2-propyl acetate, Microchem) for 4 min 
to remove the un-cross-linked SU-8, rinsed with isopropyl alcohol to wash away 
any residual developer on the wafer and dried using a nitrogen gas stream. The 
result is a set of discrete prismatic kite and dart particles in a Penrose P2 pattern, 
bound to the Omnicoat release material on the wafer, yet entirely disconnected 
from each other, despite the high area fraction. To achieve this, we adjusted the 
energy dose and post-bake duration for SU-8 as described above, optimizing this 
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lithography process for transparent glass wafers. We designed the release process 
so that the thin Omnicoat release layer dissolves in an aqueous solvent rather 
than the organic solvent (cyclopentantone) or developer for the SU-8 resist layer, 
so unexposed resist can be completely removed through development and the 
resulting SU-8 particles can be released independently. We kept the thickness of 
the layer of release material to a minimum (about 10 nm) that still provides full 
release of all tiles yet minimizes the amount of dissolved release material in the 
RSD and thereby reduces associated fluid flows due to concentration gradients. 
Transmission optical microscopy through the glass wafer in an inverted imaging 
configuration provides better-quality images of the SU-8 particles compared to 
reflection optical microscopy of the particles through a thicker layer of RSD above 
an absorbing and reflecting silicon wafer. The actual ¢, of printed tiles, which have 
the same P2 arrangement as defined in the mask, can be adjusted over a limited 
range near a, des by tuning the energy dose of the exposure through the stepper 
and by adjusting the post-exposure baking conditions. 

Releasing the tiles to form a fluctuating monolayer. To obtain large areas of stable 
fluctuating particles in a monolayer, controlling their release is essential. If the 
release occurs too rapidly, then concentration gradients of the release material can 
drive discrete particles out of the monolayer, leading to undesirable vacancies and 
defects in the overall structure. The composition of the RSD is therefore important, 
so including a stabilizing agent, to prevent irreversible aggregation of the tiles, as 
well as a depletion agent, to overcome out-of-plane entropic excitations and keep 
tiles in the monolayer, is typically necessary. 

After development of the SU-8 photoresist, but before release, around a given die 
on the dry glass wafer we build a solid square-frame enclosure of four elastomeric 
PDMS walls. The main purpose of these PDMS walls is to confine the RSD that 
is added later to release the SU-8 particles. We first make a solid disk-like layer of 
PDMS elastomer that is about 3 mm thick and 5.5 cm in diameter in a small Petri 
dish, then cut this into a square frame (inner edge length of 1.3 cm, lateral wall 
thickness of 0.2 cm), remove the square frame from the Petri dish and attach it to 
the glass wafer surrounding the die of SU-8 particles. To make the disk-like layer 
of PDMS, we mix a PDMS elastomer and its curing agent (Sylgard 184, weight 
ratio of 10:1 of elastomer:curing agent) on a clean Petri dish, gradually de-gas 
the PDMS at 25°C for 12 h, and then place the Petri dish in a vacuum oven at 
80°C for 2 h to remove any remaining entrapped bubbles and to cure the PDMS. 
Before attaching the PDMS walls to the glass wafer, we remove the Omnicoat layer 
outside the die region using a cotton-tipped applicator soaked in release solution. 
The PDMS walls form a water-impermeable bond with the glass wafer around a 
given die, and the bonded PDMS square frame and glass wafer effectively form a 
chamber (a PDMS well) into which the RSD can be loaded. An adequate volume 
of the RSD is placed in contact with the tiles attached to the release layer, so that 
components in the RSD are in excess and not consumed. A coverslip is placed on 
top of the PDMS walls and in contact with the RSD to inhibit evaporation of the 
RSD and to prevent convective flows during and after the release process. Inside 
the PDMS walls, the lithographically constructed SU-8 walls confine the particles 
to a fixed @,. The size and location of the PDMS walls can be chosen to anchor 
the SU-8 walls to the glass wafer. 

To release the tiles slowly but preserve their pre-assembled organization in the 
monolayer without introducing strong in-plane attractions between them, we 
formulated an aqueous RSD that contains: a basic release agent (tetramethylam- 
monium hydroxide, TMAH; Sigma-Aldrich, 1% w/v) that dissolves Omnicoat, 
a stabilizing agent (sodium dodecyl sulfate, SDS; MP Biomedicals, ultrapure, 
10 mM) that prevents irreversible aggregation of released SU-8 particles, and 
a depletion agent (carboxylate-stabilized, surfactant-free polystyrene spheres, 
Magsphere, 52-nm diameter, 1% w/v solids) that strongly inhibits the released 
SU-8 particles from being entropically excited in the direction normal to the sur- 
face of the glass and out of the monolayer. We fill the PDMS well completely with 
this RSD and immediately place a coverslip over it to reduce evaporation, which 
can lead to undesirable convection. As the SU-8 particles are released, dodecyl 
sulfate anions adsorb onto their surfaces, providing stabilizing screened-charge 
electrostatic repulsions and preventing irreversible aggregation of the particles 
when they collide with each other in the monolayer. The depletion agent induces 
a roughness-controlled depletion attraction that causes a preferential attraction 
of a face of each SU-8 particle towards the flat smooth surface of the glass wafer, 
whereas in-plane attractions between rougher edges of particles are small com- 
pared to thermal energy kgT, where T'~ 298 K is the temperature. We optimized 
the concentrations of SDS, the diameter and volume fraction of the depletion agent, 
the thickness of the SU-8 particles, and the type and thickness of the release layer 
to keep all of the released particles in the monolayer for an extended period of 
time (at least several days). Because rapidly dissolving the Omnicoat can drive 
undesirable fluid flows that cause tiles to lift off vertically out of the monolayer, 
we adjusted the TMAH concentration so that tiles release gradually. Moreover, the 
size and volume fraction of the depletion agent were set to ensure that each tile 
experiences a strong facial depletion attraction with the smooth glass substrate that 
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is substantially larger than kgT; however, in-plane depletion attractions with other 
tiles are much smaller, because the edges of tiles are rougher than the faces (see 
supporting information in refs 7**>), The walls of the PDMS well were designed 
to be tall enough (3 mm high) to keep any residual convection near the coverslip 
away from the monolayer, thereby enabling time-lapse imaging of the Brownian 
system of tiles over many days. 

High-resolution time-lapse particle imaging by optical bright-field transmis- 
sion microscopy. After filling and covering the PDMS well, in situ images are taken 
before, during and after particle release using an inverted bright-field microscope 
(Nikon Eclipse TE2000) equipped with a 20x objective lens (NIKON CFI60 Plan 
Achromat, 0.4 numerical aperture), a 10x objective lens (NIKON Plan Achromat, 
0.25 numerical aperture) and a Nikon D5000 camera (4,288 pixels x 2,848 pixels, 
silent mode). The image contrast and resolution are sufficiently large that even 
for the lower-power 10x objective, the position and orientation of each particle 
can be readily seen over the entire field of view. Individual images showing a fixed 
region of Penrose tiles are taken at a rate of one frame every 60 s using a computer- 
automated camera control system. 

Measuring area fractions of Penrose tiles. The interiors of all tiles in an opti- 
cal micrograph are filled with coloured pixels using an edge-detection routine 
(paint bucket fill tool) in Photoshop, and a first estimate of the area fraction is 
determined by counting these coloured pixels and dividing by the total number 
of pixels. The area fraction of tiles from optical micrographs before release is 
¢a,om = 0.677 + 0.009. This value is roughly 13% lower than the designed area 
fraction on the mask of @a des = 0.78, owing to optical diffraction during the 
lithographic printing process and to the dose response of the SU-8 photoresist. 
To improve this first estimate, we take and analyse a high-resolution SEM image 
(magnification 1,000 x), which does not suffer from optical diffraction, to deter- 
mine the area fraction of tiles, measured at their upper surfaces (tops), yielding 
PA SEMtop = 0.684 + 0.003. The factor fsemtop-om = Ga, sEMtop/Pa,oM = 1.01 is used 
to correct the uncertainties that result from optical diffraction from the particle 
edges in the optical micrographs. 

Because the SEM images of particles before release provide areas of only the tops 
of the tiles, they cannot reveal protruding features on the sides of the particles that 
correspond to edge roughness, a natural consequence of the photolithographic 
exposure and development processes. However, such edge roughness can affect the 
steric interactions between adjacent tiles. To measure a tile area fraction that could 
best be compared to future simulations of tiles with hard interactions, we use side 
or oblique SEM imaging of tiles that are lifted off the substrate after release to esti- 
mate the average edge roughness (about 180 nm). From these SEM measurements, 
we determine a correction factor for the area fraction as a consequence of edge 
roughness of fgx = 1.13, The measured area fraction ¢, is then determined from 
the filled tiles in the optical micrographs by combing the two correction factors: 
a =fsemtop-omfer?a,om- Thus, the da values that we report include corrections 
for the dose response of the photoresist, blurring caused by optical diffraction and 
the roughness of the edges of the tiles. 

Fourier transforms of microscope images containing filled kite and dart tiles. 
Measured images in 24-bit RGB colour are converted to unsigned 8-bit greyscale 
and then cropped into a square region of 2,048 pixels x 2,048 pixels. To enhance 
contrast in an image, the interiors of the particles are filled with black and the 
regions outside the particles are filled with white in Photoshop. We use ImageJ 
to perform a 2D fast Fourier Transform (FFT) of these black-and-white images, 
yielding intensity and phase information. We display the FFT intensity using a 
colour lookup table, to convert the greyscale intensity values to colours. In some 
cases, we average the FFT intensities of several images taken at different times. 
Fitting azimuthally varying intensities of rays and peaks in Fourier trans- 
forms. In Fig. 2i, we fit the Y-dependent intensities (azimuthal line shapes) 
of rays at high q using a single Gaussian function with a constant background 
Ip: (a) = Ip + Inexp[—(H — dp)7/(207)], where I}, is the peak intensity rela- 
tive to Ip, Wp is the mean peak angle and 7, is the standard deviation in angle 
(the effective azimithal width of a ray). A fit to the data in Fig. 2i before release 
yields Ip = (1.48 + 0.13) x 10°, Ty = (11.5 £0.3) x 10°, wp =50.6° + 0.1° and 
Wo = 2.6° £0.1°, with a correlation coefficient of R?=0.996. A fit to the data 
in Fig. 2i after release yields Ip = (0.45 £0.09) x 10°, I, = (1.70 £0.11) x 10°, 
Wp =50.5° + 0.3° and 7), =5.4° + 0.5°, with R? = 0.977. Entropic fluctuations after 
release therefore cause a noticeable increase in the width and a decrease in the 
central intensity of the rays at high q. 

In Fig. 2), for the FFT before release, we semi-empirically fit the azimuthal line 
shape at intermediate q using a double Gaussian function: Ig + Inexp[—() — Wp)/ 
(24917) + Inzexpl[—(b — whp)*/(2%527)], yielding Ip = (1.76 + 0.18) x 101}, 
Thy = (8.24 1.6) x 10", bp =50.7° £0.19, 91 = 1.03°£0.21°, Ino = (5.5 + 1.6) x 10" 
and q),2=3.18° + 0.53°, with R? = 0.996. The double Gaussian function is necessary 
to fit the more pronounced Bragg-like peak before release. In Fig. 2), after release 
this sharper peak has disappeared, so we fit the azimuthal line shape at intermediate 
q using only a single Gaussian function, as before, yielding Ip = (0.46 + 0.19) x 101), 
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Ty = (1.72 £0.18) x 10", Up =51.2°+0.4° and 7, =6.8° + 1.0°, with R? = 0.96. 
Thus, the initial sharper Bragg-like nature of this peak before release has been 
markedly reduced long after release as a consequence of the Brownian fluctuations 
of the tiles in the P2 system. 

Tracking positions and orientations of darts in PSDMs. We wrote a customized 
particle-tracking routine in Mathematica (Wolfram Research, version 11.2) to 
track the centroids and pointing directions of the five darts in a fluctuating PSDM 
and reveal the local symmetry-breaking configurational fluctuations of this motif 
caused by Brownian excitations (Supplementary Video 4, Extended Data Fig. 5). 
This routine thresholds, binarizes and fills the five dart tiles. A combination of the 
MorphologicalComponents, EdgeDetect and ImageCorners functions are used to cal- 
culate the centroids of each of the five darts and to locate their three convex vertices 
(Extended Data Fig. 5a). The centroids are sorted anticlockwise and connected as 
motif vertices to form a convex pentagon (blue lines), the area and internal angles 
of which are computed in each frame. The centroids and convex vertices are used 
to calculate the pointing directions of the darts (red arrows displayed at motif 
vertices). To correct for a slight long-time drift in the trajectories of the darts, we 
determine an average position of the centroids of all five darts in each frame, and 
the individual trajectories of the darts are then calculated relative to this collective 
frame (Extended Data Fig. 5b). These trajectories exhibit an anisotropy that reflects 
the underlying five-fold nature of the PSDMs. Because a PSDM has more highly 
corrugated edges, it is more strongly rotationally coupled to the surrounding tiles 
than is a PSKM, so it does not exhibit collective rotational hopping motion (as was 
observed for PSKMs). 

Calculations and fits of the distributions of areas and internal angles of fluc- 
tuating PSDMs. We calculated normalized probability distributions on the basis 
of the recorded set of areas Apspm and internal angles Gpspm of the pentagons of 
connected motif vertices given by the centroids of the five darts using all frames in 
Supplementary Video 4 (Extended Data Fig. 5c, d). Because these five darts cannot 
be compressed below a limiting area Ao, which corresponds to them touching, the 
area distribution p,4 pspm is asymmetric and exhibits a strict cut-off at low areas; 
we fit it using a three-parameter normalized log-normal distribution (to simplify 
notation, we take A = Apspm and pa = Ppa psp): 


exp( — {In[(A — Ag) /SA]}’ /27”) 
J2ny(A— Ag) 
0 for A< Ap 


for A > Ao 
Py = 


The results of the fit are displayed in Extended Data Fig. 5c as the black line, and 
the values of the fit parameters obtained are Ay =52 +7 jum’, 6A =22 +7 jum? and 
y=0.17 40.05 (R? = 0.994). Thus, the PSDM exhibits substantial area fluctuations 
caused by Brownian excitations, reflecting local equilibrium density fluctuations 
of this particular motif within the P2 quasi-crystal system. These Brownian excita- 
tions also lead to local symmetry-breaking fluctuations of the PSDM, as revealed by 
the sequence of differently shaped blue pentagons in Supplementary Video 4 and 
Extended Data Fig. 5a; over time, this fluctuating blue pentagon exhibits distortions 
away from a perfect regular pentagon. To quantify these distortions, we calculate 
the normalized probability distribution of internal angles pgpspm of this fluctuating 
blue pentagon (Extended Data Fig. 5d). We fit this distribution using a Gaussian 
function, yielding a mean of 69.3° and a standard deviation of 17.5° (R’=0.99). 
A direct calculation using the un-binned list of internal angles gives similar results 
(mean, 72.0°; standard deviation, 15.9°), irrespective of the binning parameters 
used to create the distribution. Thus, the time-averaged configuration of darts 
within a PSDM does reflect the basic five-fold symmetry of this motif through the 
mean internal angle of 72°, corresponding to a regular pentagon. Nevertheless, 
Brownian excitations cause large local symmetry-breaking fluctuations in the 
instantaneous configurations of dart tiles about this symmetric mean, leading to 
a relatively large and readily observable standard deviation in the distribution of 
these internal angles. 

Melting kite and dart motifs in a P2 quasi-crystal. In Fig. 4, we fit the measured 
oa(d) at certain fixed times after release t using a modified Fermi-like function 
to capture the evolution of diffusive dynamics associated with melting as the P2 
quasi-crystal expands to occupy empty space after a confining wall is removed: 
oa(d) = oa" /{1 + exp[(d — do)/L]}, where ¢q" is the plateau area fraction in the 
dense region (to the left), do is a reference distance that roughly defines the posi- 
tion of the melting front and L is a characteristic length scale associated with the 
width of the melting front. We use similar Fermi-like fitting forms to fit data for 
the d-dependent area fractions of PSKMs (@4,xm) and PSDMs (¢,,pm). All fitting 
parameters for a (Fig. 4d), da.xm (Fig. 4e) and da,pm (Extended Data Fig. 6), at 
the certain t are shown in Extended Data Table 1. 

Determining the melting area fraction of the P2 tile system 4 met. In Fig. 4i, 
we fit all data for da xm(¢a) > 0 using linear functions. The slopes of a xm(¢a) 
24h, 48 h and 60 h after release are 1.30 +0.04, 2.26+0.14 and 2.42 + 0.53, 


6 SEPTEMBER 2018 | VOL 561 | NATURE | 101 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


respectively, and the intercepts with the @, axis are 0.544 + 0.006, 0.633 + 0.004 and 
0.647 + 0.009, respectively. At t= 48 h and t=60 h after release, these intercepts 
are approximately the same, so we estimate that the area fraction at the melting 
point of the P2 quasi-crystal is ba mett + 0.65 + 0.02. This value is associated with 
the melting of PSKMs; larger-scale superstructures of motifs become ill-defined 
when the fundamental motif structures become disorganized. 

Immediate melting of P2 kite and dart tiles at a lower area fraction. We also 
pre-assembled kite and dart tiles at a uniformly lower area fraction of ¢, 0.53 
by creating even larger spaces between adjacent tiles in a Penrose P2 pattern mask 
design, but still enclosing all tiles using large confining walls. As the kite and dart 
tiles are released after adding the RSD, the P2 pattern melts and becomes a disor- 
dered liquid-like phase because the tiles have more empty space in which they can 
rotate and translate, even as they stay in the monolayer. This immediate melting 
behaviour at uniform ¢, ~ 0.53 is consistent with the higher melting area frac- 
tion da melt 0.65 + 0.02 that we determined above for the unconfined P2 system. 
Although specific tile area fractions were designed into the mask patterns, the 
area fraction of tiles printed using a particular pattern design can be reduced by 
lowering the exposure dose in the stepper down to as low as about 150 mJ cm~?. At 
even lower doses below this value, the cross-linking within the tiles is compromised 
and vertex rounding becomes extreme. This limited control over the area fraction 
of tiles at the printing stage could potentially be useful because it could eliminate 
the need to design a new mask pattern. 

Additional discussion. Although different top-down lithographic methods have 
been used to produce a wide variety of custom shape-designed colloidal parti- 
cles’"'!, these processes typically yield stable bulk dispersions of desired shapes 
in a liquid. These shapes have subsequently been used in depletion-driven*!-** 
and capillary-driven***> self-assembly experiments, providing insights into how 
features in the shapes of particles can influence self-assembly. For instance, litho- 
graphic mutations of sub-particle features of colloidal chiral C-shapes that resemble 
proteins, known as proteoids, have been used to control the entropic hierarchical 
self-assembly of dimer crystals under slow crowding in two dimensions as rough- 
ness-controlled depletion attractions keep the fluctuating monolayer intact**. 
Nevertheless, the sizes of crystallites of such self-assembled structures are typi- 
cally small, and a high defect density often accompanies this type of self-assembly. 
Although such earlier studies have provided important insights into the role of 
core shape and entropy in self-assembly of gradually crowded systems, including 
protein crystallization, these limitations have precluded the widespread use of such 
self-assembled structures. By contrast, litho-PAMs can be used to produce and 
study complex fluctuating multiscale systems of mobile tiles at high tile densities, 
as we have demonstrated using optical stepper lithography and optical microscopy. 
Thus, litho-PAMs are very different from previous 2D self-assembly experiments 
in which monolayers of shape-designed lithographic colloids have been randomly 
deposited at dilute area fractions in a monolayer and subsequently concentrated 
slowly using gravitational sedimentation in the presence of roughness-controlled 
depletion attractions”>. The tilted-cell approach creates a gradient in particle area 
fraction throughout; by contrast, the litho-PAM method provides a uniform area 
fraction over a very large surface area. Thus, pre-assembly at uniform @, avoids 
potential issues of out-of-equilibrium jamming during crowding as well as the spa- 
tial gradients in @, that are inherent in the tilted-cell method of 2D self-assembly. 

Achieving a monolayer of fluctuating kite and dart tiles in a Penrose P2 qua- 
si-crystalline pattern*” that is defect-free over large areas and in which single-par- 
ticle and multi-particle collective dynamics can be readily visualized advances the 
experimental science of multiscale complex systems of mobile colloidal particles. 
Moreover, this achievement demonstrates that new equilibrium phases composed 
of many differently shaped and configured building blocks can be readily pro- 
duced using top-down parallel fabrication methods, circumventing bottom-up 
self-assembly methods and serial directed-assembly methods. Litho-PAMs can 
even enable the creation and study of different individual ground-state configu- 
rations for systems that have degenerate ground states with the same free energy 
yet different polymorphic organizations of tiles. Such degeneracy would usually 
preclude simple self-assembly methods from producing different desired ground- 
state configurations, because different ground-state polymorphs with essentially 
the same energy would nucleate and grow in different local regions**. 

Until now, attempts to create a large-scale fluctuating Penrose quasi-crystal of 
mobile colloidal tiles, which are organized with five-fold symmetry and have effec- 
tively hard interactions, have not succeeded. Further, no experimental assembly 
method has been able to produce a large-scale fluctuating colloidal version of the 
five-fold Penrose P2 quasi-crystal*® that can be observed in real-space with parti- 
cle-scale resolution until now. In a previous report of a self-assembled soft-matter 
quasi-crystal of nanoparticles that were crowded through evaporation, the crys- 
tallites were limited to 12-fold”* and 18-fold symmetry*. It has been proposed 
and investigated in simulations‘ that a Brownian Penrose quasi-crystal could 
be self-assembled by introducing selective site-specific edge-edge interactions 
between the tiles; however, this has not been realized experimentally, probably 
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because of the substantial complexity of creating the required variety of selective 
edge-edge interactions on real particles. By contrast, the litho-PAM approach over- 
comes these limitations, enabling us to produce the quasi-crystal presented here. 
Moreover, we have visualized the equilibrium fluctuations of this quasi-crystal, 
including heterogeneous collective dynamics of certain motifs, as well as melting 
by slowly lowering the tile density. Our ability to observe mobile tiles fluctuating 
in equilibrium using optical microscopy has enabled us to identify the hallmarks 
of the pentatic liquid quasi-crystalline phase of matter, which is analogous to the 
fluctuating hexatic phase of mobile hard disks subject to Brownian excitations. This 
provides in situ dynamic information that cannot be obtained by studying a static 
solid film of nanoparticles using electron beam microscopy after evaporating the 
continuous liquid phase. 

Fundamental questions about spatio-temporal dynamics arise when considering 
complex multiscale materials composed of mobile building blocks, such as the 
P2 quasi-crystal system. Traditional models of phase behaviour and equilibrium 
fluctuations, such as Onsager-like cage models, do not necessarily apply to multi- 
scale systems that have a large variety of dynamic motifs and patterns that can be 
hundreds or more times larger than the smallest particles. Understanding equi- 
librium and non-equilibrium dynamics of multiscale systems that have differently 
organized structures at different length scales, including those well beyond the 
colloidal scale, remains a challenge theoretically and experimentally. For example, 
directly applying the notions of Kosterlitz-Thouless theory” to a fluctuating P2 
quasi-crystal is challenging; yet, on the basis of our observations, doing so would 
probably show the emergence of liquid-crystalline-like features in correlation func- 
tions and order parameters at smaller length scales for values of da between the 
disordered liquid state at low , and the full-tiling limit as 6, — 1. Just as disks can 
form a hexatic liquid-crystal phase”’ over a certain range of ¢, our observations 
indicate that liquid-crystal-like modulations in the Fourier transforms develop for 
Brownian quasi-crystals of hard P2 tiles at intermediate wavenumbers q. Yet, we 
also find that the degree of fluctuation-induced smearing of quasi-crystal Bragg 
peaks at our smallest observable q, corresponding to the largest length scales, is 
not as large as the degree of fluctuation-induced smearing at intermediate g. These 
features differentiate the Fourier transforms of the fluctuating pentatic liquid qua- 
si-crystalline phase from the q-independent perfect Bragg peaks associated with 
crystallography of ideal static quasi-crystals constructed geometrically“. The 
&mso-pcr(r) that we developed (see Supplementary Methods) is a first attempt 
at quantifying superstructural orientational order in hierarchically organized 
multiscale materials in real space, and this concept can be further broadened 
and generalized. This brings up an important theoretical question for multiscale 
materials, which can have different symmetries and potentially incommensurate 
organizational structures at different length scales, of what is meant by short-, 
intermediate- and long-range order. Using the Brownian P2 quasi-crystal as an 
example, we believe that it may be necessary to correlate the type and degree of 
order with a range of length scales (or equivalently range of q) in multiscale materi- 
als, and possibly even spatial locations; any new theory would seek to couple these 
differently structured regions together self-consistently. Furthermore, a theoretical 
exploration of melting in hierarchically organized systems of differently shaped 
fluctuating building blocks could enable a direct comparison with our experimental 
results of melting. At a first level, such a theoretical exploration could be based 
around a multi-body free energy that is entirely entropic in origin, by considering 
the entropy of allowed microstates of positions and orientations of all tiles subject 
to the constraints of non-overlap. 

For classic solid-state atomic quasi-crystals, the notion of phason strain* can 
be used to explain real-space experimental observations that positions of certain 
atoms do not conform to a perfect ideal quasi-crystal structure*®. To analyse these 
images, quasi-crystal tiles are decorated with dots and lines to represent atoms and 
the bonds between atoms, respectively. Alterative tilings, which are not perfect 
Penrose quasi-crystals and do not follow standard matching rules everywhere but 
still fill space, can be made with these decorated tiles, thereby creating different 
local isomorphs. These alternative tilings typically lead to the phenomenon of 
phason strain when atom and bond decorations in adjacent tiles do not match 
up. However, in the case of our quasi-crystal, the hard tiles are the fundamental 
particles, so there are neither atoms nor covalent bonds between constituent tiles. 
Thus, atomic interpretations of phason strain, which have been used to explain 
phenomena in solid-state quasi-crystals with very strong attractive interactions 
between constituent atoms, are not directly applicable to systems of dispersed 
hard tiles. Instead, a description assuming an entirely entropic free energy, as pre- 
viously stated, would be most appropriate for predicting the equilibrium prop- 
erties and fluctuations of such systems. It would be possible to print alternative 
local isomorphs that do not necessarily follow matching rules for perfect Penrose 
quasi-crystals using the litho-PAM method, but the free energies of tile systems 
containing these isomorphs may be degenerate because there are no substantial 
in-plane attractive interactions between the tiles. In the future, litho-PAMs could 
be used to introduce different types of defect in the tilings, such as dislocations and 
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disclinations, which could have a substantial effect on a free energy of the tiles that 
is primarily entropic in origin. 

It is interesting to consider the motifs and superstructures of motifs that we 
have identified through configurations of P2 tiles at different scales in light of 
previous work on coverings and quasi-unit cells. Overlapping decorated decagonal 
tiles has been shown” to be a useful method of generating pentagonal quasi- 
crystalline tilings according to prescribed geometrical rules that govern the overlap. 
These overlapping decagonal systems were referred to as ‘coverings, to distinguish 
them from ‘tilings, which typically do not have overlap. Maximizing the density 
of decorated quasi-unit cells was introduced"! as a simple yet powerful, general 
method of generating quasi-crystalline tilings. Using this quasi-unit cell approach, 
it is possible to explain, for example, the formation of Al72Ni29Cog quasi-crystals 
observed“ using high-angle annular dark-field imaging. The C-clusters (which 
represent an energetically preferred low-energy atomic cluster, implying strong 
attractive interactions between atoms compared to kgT) of the quasi-unit cells*! 
are overlapped, and when their density is maximized a Penrose tiling is produced. 
Thus, in these previous works, overlap of decagons or of quasi-unit cells is needed 
to produce quasi-crystalline coverings or tilings. By contrast, our system, which is 
composed of non-overlapping kite and dart tiles in a monolayer at an area fraction 
of less than unity, is formally neither a covering nor a close-packed tiling (which 
would correspond to ¢, = 1). 

Data availability. The data shown in the figures and that support the findings 
of this study are available from the corresponding author on reasonable request. 
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& Fill with Release Solution-Dispersion (RSD) 
SU-8 Tiles 
Omnicoat Release Layer Dissolves pa 
SU-8 Tiles Are Released (1°t order kinetics) Oise walet 
SU-8 Tiles Are Stabilized by Adsorbed DS” 
Depletion Agent in RSD Creates Anisotropic 
Roughness-Controlled Depletion Attractions 
- Lubricated out-of-plane attraction between 
smoother faces of tiles and glass surface SU-8 Tiles 
- In-plane depletion attractions strongly 
suppressed by rougher edges of tiles Glass wafer 
Random Brownian Fluctuations Excite | 
Monolayer of Mobile, Pre-Configured, 
Custom-Shaped Colloidal Tiles 
SU-8 Tiles 
Glass wafer 
Vs 
Extended Data Fig. 1 | Detailed sequence of steps for fabricating and dissolve the Omnicoat release layer (green), a stabilizing agent (SDS) that 
releasing a dense litho-PAM of mobile shape-designed tiles. The tiles adsorbs onto released tiles and prevents their aggregation, and a depletion 
(purple), which are composed of cross-linked polymeric SU-8 photoresist, | agent (nanoscale polystyrene spheres) that strongly inhibits the released 
are released after exposure and development from the glass wafer (black tiles from leaving the monolayer as a consequence of Brownian excitations 


hatching) using an RSD (blue) that contains a release agent (TMAH) to (steps are shown as side views). 
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Extended Data Fig. 2 | Examples of organized superstructural sets 

of motifs extending over different length scales in the P2 quasi- 
crystal before and after release. a, Regular centre-filled pentagonal 
superstructural set of one central and five outer wheel motifs (each wheel 
motif consists of ten kites (filled blue) and five darts (filled red)) before 
release (0 h, leftmost) and after release (6 h, 12 h, 24 h, 36 h and 48 h, left 


v SES * 


to right). b, Regular decagonal superstructural set of ten wheel motifs 
before release (0 h, leftmost) and after release (12 h, 24 h and 48 h, left to 
right). c, Regular icosagonal (that is, a 20-sided polygon) superstructural 
set of 20 wheel motifs before release (0 h, left) and after release (48 h, 
right). Scale bar (201m) shown in a is the same for all images. 
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Extended Data Fig. 3 | Restructuring of Penrose kite tiles in a P2 quasi- fluctuations near equilibrium (48 h after release) have caused kite tiles to 
crystal after release. a-d, Before release; e-h, after release. a, Kite tilesare deviate from the original perfect quasi-crystal order seen in the unreleased 
separated by post-processing of the micrograph image from Fig. 2a. Scale structure; kites are separated from Fig. 2e. f, Fourier transform intensity 
bar, 201m. b, Effective scattering pattern, given by the Fourier transform of e. Rays have broadened azimuthally as a consequence of Brownian 
intensity of a, showing ten rays extending from the centre to high fluctuations. g, h, Close-ups of f over the same q ranges as for c and d, 
scattering wavenumbers q. c, Central region of b, magnified by a factor of respectively, showing the smearing of Bragg peaks into ten-fold azimuthal 
about six, revealing Bragg peaks at intermediate and low q, corresponding _ intensity modulations, reminiscent of liquid-crystalline materials, 

to large distances. d, Central region of c, magnified by a factor of about indicating a retention of quasi-crystalline orientational order. Intensity 
two, revealing Bragg peaks at very low q associated with superstructural colour scale is the same as in Fig. 2k. 

ordering of motifs of kite tiles over large length scales. e, Brownian 
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Extended Data Fig. 4 | Restructuring of Penrose dart tiles ina P2 quasi- _fluctuations near equilibrium (48 h after release) have caused dart tiles to 
crystal after release. a—d, Before release; e-h, after release. a, Dart tiles are deviate from the original perfect quasi-crystal order seen in the unreleased 


separated by post-processing of the micrograph image from Fig. 2a. Scale structure; darts are separated from Fig. 2e. f, Fourier transform intensity 
bar, 20m. b, Effective scattering pattern, given by the Fourier transform of e. Rays have broadened azimuthally as a consequence of Brownian 
intensity of a, showing ten rays extending from the centre to high fluctuations. g, h, Close-ups of f over the same q ranges as for c and d, 


scattering wavenumbers q. c, Central region of b, magnified by a factor of _ respectively, showing the smearing of Bragg peaks into ten-fold azimuthal 
about six, revealing Bragg peaks at intermediate and low q, corresponding —_ intensity modulations, reminiscent of liquid-crystalline materials, 

to large distances. d, Central region of c, magnified by a factor of about indicating a retention of quasi-crystalline orientational order. Intensity 
two, revealing Bragg peaks at very low q associated with superstructural colour scale is the same as in Fig. 2k. 

ordering of motifs of dart tiles over large length scales. e, Brownian 
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Extended Data Fig. 5 | Tracking anisotropic, bounded Brownian 
fluctuations of darts in a PSDM. a, Filled and thresholded optical 
micrographs of darts in individual video frames (frame number in upper 
right) are overlayed with blue pentagons with vertices at the centroids 

of the darts (see Methods, Supplementary Video 4). Red arrows at the 
vertices of the blue pentagon denote the pointing directions of the darts. 
The shapes of the blue pentagons fluctuate over time and at any given 
instant can deviate substantially from a regular pentagon as a consequence 
of Brownian excitations of the P2 system. Actual time between frames 

is 720 s. Scale bar, 3m. b, Trajectories of the centroids of five darts in a 
PSDM over a duration of 32 h, after correcting for a slight long-time drift 
of the entire motif. The time-average position of each dart is denoted 

by a plus symbol overlaid on each trajectory; the centre of the PSDM is 
given by crossed box symbol at the centre. These trajectories have non- 


75 85 95 0 40 80 
Bespm (deg) 


circular shapes, indicating that the bounded Brownian motion of the 

darts is anisotropic, reflecting the local five-fold time-averaged quasi- 
crystal symmetry of the motif. Considering an ensemble average over all 
five darts, standard deviations of step size distributions projected along 
directions between the centre of the motif and the time-averaged centroids 
of the five darts are 1.3 +0.2 times less than standard deviations of step 
size distributions projected perpendicular to these five directions. This 
small yet detectable anisotropy in the bounded diffusion of the darts 
reflects the underlying time-averaged five-fold symmetry of their local 
quasi-crystal environment. Scale bar, 3 1m. c, d, Normalized probability 
distributions of the calculated area Apspm (c) and internal angles Gpspm (d) 
of the fluctuating blue pentagons in Supplementary Video 4 and a. Black 
lines, fits using a log-normal distribution (c) and a Gaussian distribution 
(d); see Methods for functional forms and fit parameters. 
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d (um) 
Extended Data Fig. 6 | Melting of PSDMs in an unconfined P2 quasi- over time: black circles, 4 h after release; orange diamonds, 24 h after 
crystal. The area fraction of PSDMs ¢a,pm (red darts in Fig. 4a—c) decays release; purple triangles, 48 h after release. Fits are using a Fermi-like 


to zero at larger distances d, towards the direction where the quasi-crystal function (see Methods); fit parameters are given in Extended Data Table 1. 
is not confined (right). The motif melting front moves from right to left 
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Extended Data Table 1 | Fit parameters of the Fermi-like functions that describe the dependence of da, da xm and ¢a,pm On d during melting 
of the P2 quasi-crystal 


Data Fit Fitting t=4h t=24h t=48h 
Parameter 
da(d) oa* 0.770+0.001 0.768+0.003 0.762+0.002 
do (um) 846 + 16 74449 77345 
L (um) 98 +7 95+7 16245 
R? 0.994 0.994 0.999 
daxu(d) dak” 0.314+0.003 0.285+0.005 0.269+0.013 
dom (um) 750 + 35 567+6 387 + 16 
Li (um) 79 +22 49+6 66 +13 
R? 0.921 0.989 0.981 
d¢apu(d) davw*™ 0.058+0.001 0.062+0.001  0.029+0.003 
do.pm (um) 691+7 517+5 348 + 29 
Lom (um) 4148 30+4 72 +24 
R? 0.953 0.995 0.953 


Fits are shown in Fig. 4d, e and Extended Data Fig. 6; see Methods for the fitting functions and definitions of the fit parameters. R2, correlation coefficient; t, time after initiation of release. 
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Climate-induced changes in continental-scale soil 
macroporosity may intensify water cycle 


Daniel R. Hirmas!*, Daniel Giménez, Attila Nemes’, Ruth Kerry*, Nathaniel A. Brunsell® & Cassandra J. Wilson>® 


Soil macroporosity affects field-scale water-cycle processes, such 
as infiltration, nutrient transport and runoff”, that are important 
for the development of successful global strategies that address 
challenges of food security, water scarcity, human health and loss 
of biodiversity®. Macropores—large pores that freely drain water 
under the influence of gravity—often represent less than 1 per cent 
of the soil volume, but can contribute more than 70 per cent of the 
total soil water infiltration‘, which greatly magnifies their influence 
on the regional and global water cycle. Although climate influences 
the development of macropores through soil-forming processes, the 
extent and rate of such development and its effect on the water cycle 
are currently unknown. Here we show that drier climates induce the 
formation of greater soil macroporosity than do more humid ones, 
and that such climate-induced changes occur over shorter timescales 
than have previously been considered—probably years to decades. 
Furthermore, we find that changes in the effective porosity, a proxy 
for macroporosity, predicted from mean annual precipitation at the 
end of the century would result in changes in saturated soil hydraulic 
conductivity ranging from —55 to 34 per cent for five physiographic 
regions in the USA. Our results indicate that soil macroporosity may 
be altered rapidly in response to climate change and that associated 
continental-scale changes in soil hydraulic properties may set up 
unexplored feedbacks between climate and the land surface and thus 
intensify the water cycle. 

At a continental scale, the biogeochemical and mechanical pro- 
cesses responsible for the formation of macropores? (for example, soil 
biological activity, tillage practices and the natural formation of soil 
aggregates) are mediated by climatological factors, such as precipitation 
and temperature, that largely control soil moisture and energy fluxes. 
By inference, therefore, the formation of macropores in the soil must 
also be mediated by climate. Given the control of soil macroporosity 
over water cycling between the land and atmosphere, predicting 
its response to future climate scenarios is crucial. This prediction is 
currently impossible, however, because the degree and rate at which 
climate influences soil macroporosity are unknown. 

Moreover, research examining the response of the land surface to 
climate change has commonly assumed that hydraulic properties of the 
soil that control water flux through the vadose zone, such as saturated 
hydraulic conductivity, remain static over timescales relevant to climate 
simulations>®. Climosequence studies, by contrast, indicate that soils 
aggregate into distinct structural units under the prevailing soil climatic 
conditions, forming interaggregate macropores through chemical and 
physical processes—such as the formation of stable associations of 
organic matter, clay and silt particles’ and wet-dry and freeze-thaw 
cycles®—that can affect macroporosity seasonally. Biopores created 
via root penetration’® and faunal burrowing"! are also influenced by 
climate. The overall rate at which macropores develop from the com- 
bination of these processes is unknown, although it could be as rapid 
as 15 years in organic soils!” 

Here we investigate the implicit assumption that soil macroporosity— 
and, thus, soil hydraulic properties—are static over the timescales of 


end-of-century model simulations by examining continental-scale 
effects of climate on the development of effective porosity using a 
large database of soils sampled across the USA over approximately 
the past 50 years. Although multiple definitions have been proposed 
for effective porosity, we consider it to be the difference between total 
porosity and field capacity!?. Field capacity is the water content of 
an initially saturated soil that has ceased to drain owing to a balance 
between soil matric potential and gravitational forces. Because this 
balance is often assumed to be represented by a soil water pressure 
potential of —33 kPa, corresponding to pores of approximately 9 um 
in diameter, field capacity represents the volume fraction of soil pores 
smaller than this diameter. As the difference between total porosity 
and field capacity, therefore, effective porosity is a simple but useful 
proxy of macroporosity, representing the volume fraction of the largest 
pores in the soil’, 

Soil data were obtained from the National Cooperative Soil Survey 
(NCSS) Characterization Database, which contains information on soils 
sampled and measured by the USA Department of Agriculture's Natural 
Resources Conservation Service (USDA-NRCS) and cooperating 
laboratories from the late 1960s to the present. Although other soil 
databases have been compiled from various sources and research pro- 
jects, a major advantage of the NCSS database is that soils were analysed 
by adhering strictly to well documented standard operating procedures, 
making the soil information highly reliable and thus comparable, 
despite the widespread sampling dates and geographic distribution of 
the samples (Extended Data Fig. 1). 

Macroporosity is strongly dependent on particle-size distribution 
and the fraction of soil carbon!*. To evaluate the influence of climate on 
effective porosity, therefore, we treated the sand, clay and soil organic 
matter contents of each selected soil as covariates and calculated the 
residual effective porosity as the difference between measured effec- 
tive porosity and the effective porosity predicted from a spatial-error 
regression model using these covariates. To assess climatic controls of 
soil macroporosity with depth, we grouped soil samples by morphology 
into surface layers (A horizons) that occurred within 0-25 cm of the 
land surface and subsurface layers (B horizons) that occurred within 
25-100 cm of the land surface (Extended Data Table 1). To minimize 
effects from confounding variables, morphological horizons that were 
designated from field excavations to contain concentrations of aggre- 
gating agents, such as carbonate (for example, Bk), or were indurated 
(for example, Bqm) were not considered in this study. We additionally 
selected ploughed surface layers (Ap horizons; Extended Data Table 1, 
Extended Data Fig. 1) to assess effects of climate on the rate of regenera- 
tion of macroporosity disturbed by tillage. The adoption of non- 
inversion tools over the past 50 years led to reduced mechanical 
disturbance of Ap horizons!* and facilitated the expression of climate 
on macroporosity in ecosystems that are uniformly managed and sup- 
port similar vegetation across climatic zones’®. An unknown fraction 
of these Ap samples probably came from irrigated fields; this additional 
water may have reduced, but not eliminated, soil moisture differences 
compared to samples unaffected by tillage. 
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Atmospheric Science, University of Kansas, Lawrence, KS, USA. Present address: KRNV News 4, Reno, NV, USA. *e-mail: daniel.hirmas@ucr.edu 


100 | NATURE | VOL 561 | 6 SEPTEMBER 2018 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


Latitude (°) 


A horizons B horizons 
25 T 1 1 1 T T 25 T 1 1 1 T ; 
-120 -110 -100 ~90 -80 -70 -120 -110 -100 ~90 -80 -70 
c 
45 
£ 40 
Oo 
no} 
= 
= 35 
oO 
al 
30 
25 A horizons 
-120 -110 -100 ~90 -80 -70 -120 -110 -100 ~90 -80 -70 
Longitude (°) Longitude (°) 


Fig. 1 | Interpolated maps showing continental-scale patterns of 
macroporosity before and after accounting for non-climatic factors 
using regression with spatially correlated errors. a—d, Effective porosity 
(EP; a, b) and residual effective porosity (REP; ¢, d) for the surface (a, c) 
and subsurface (b, d) layers examined in this study. Effective porosity 
values represent volumetric porosity. Arrows in a point to selected 


Figure 1a, b shows interpolated distributions of effective porosity 
for surface and subsurface layers. The visual pattern of the effective 
porosity of a surface layer follows the spatial distribution of physio- 
graphic provinces, with high values in the Rocky Mountain System 
and Intermontane Plateaus in the western half of the conterminous 
USA and in the New England province and Atlantic Plain in the east. 
This pattern suggests that the distribution of macroporosity is largely 
a function of soil texture’? and possibly mineralogy, both of which are 
derived from soil parent material. For subsurface layers, the pattern is 
more muted compared to surface layers, especially in the Atlantic Plain, 
probably reflecting the illuvial processes that have increased the clay 
content of the B horizons!’ (Fig. 1b). The enrichment in translocated 
clay and the lower effective porosity values in subsurface layers 
also suggest that soil texture is the dominant variable controlling the 
expression of effective porosity at this scale. However, when the influ- 
ence of soil texture and organic matter are removed, the distribution 
follows a more distinct pattern with climate (Fig. 1c, d), with low resid- 
ual effective porosity of the surface layers occurring in fully humid 
snow climates (Df in the modified Képpen-Geiger classification’), 
moderate residual effective porosity values in warm temperate climates 
that either have dry summers (Cs) or are fully humid (Cf), and high 
residual effective porosity values in arid steppe (BS) climates. These 
results indicate that drier and warmer climates promote the develop- 
ment of surface-layer macroporosity, whereas more humid and cooler 
climates restrict the expression of macroporosity. Residual effective 
porosity in subsurface layers follows the same basic pattern as in surface 
layers but shows a larger lateral extent of high values in both eastern and 
western directions (Fig. 1d). This subsurface residual effective porosity 
pattern resembles the distribution of precipitation timing (see Extended 
Data Fig. 2) probably owing to the effect of rainfall frequency on the 
average penetration depth of percolating meteoric water. 

We plot the mean residual effective porosity for samples obtained 
from locations nearest USA Historical Climatology Network (USHCN) 
stations that recorded observations between 1951 and 2011 against 
four climatological parameters—mean annual precipitation, mean 
atmospheric vapour pressure deficit for the months of June, July 
and August, mean precipitation event magnitude and mean freezing 
atmospheric temperature frequency—to assess the influence of these 
variables over soil macroporosity (Fig. 2). Residual effective porosity 
shows an inverse relationship with both mean annual precipitation 


physiographic regions: RMS, Rocky Mountain System; NEP, New England 
province; AP, Atlantic Plain; IMP, Intermontane Plateaus. Arrows in c 
point to selected Képpen-Geiger climatic zones referred to in the text. 
Arrows in d illustrate the lateral extent of residual effective porosity in 
subsurface layers compared to surface layers (c). 


and mean precipitation event magnitude and a positive relationship 
with atmospheric demand for moisture (maximum vapour pressure 
deficit) in natural surface layers (A horizons). This provides further 
evidence that the development of macroporosity is restricted under 
humid climates with larger individual rainfall events than under drier 
climates. Several factors may be responsible for this trend, including 
faster turnover rates of macroaggregates in response to more frequent 
pulses of water®, which would shift soil pores towards smaller sizes. We 
observed a positive association with the frequency of freezing tempera- 
tures in the atmosphere, suggesting that abiotic mechanisms may also 
contribute to the development of macroporosity in the surface. Except 
for freezing frequency, the same relationships between residual effective 
porosity and climatological variables were observed in subsurface layers 
(B horizons) but with lower variability, probably because they reflect 
longer-term averages of atmospheric conditions (owing to their depth) 
that have led to the development of these horizons. 

Because Ap horizons represent similar and often frequent (annual 
to decadal) soil disturbance across various climate zones, owing to 
the relative geographic uniformity of agricultural management prac- 
tices, these surface layers act as a proxy for the rate of macroporosity 
regeneration. If soil macroporosity develops slowly (over pedogenic 
timescales), then land-surface disturbance from tillage would reduce 
or altogether eliminate the trends observed for natural surface or sub- 
surface layers. By contrast, Fig. 2 shows significant (P < 0.001) relation- 
ships with the residual effective porosity of ploughed surface layers and 
the climatological variables, as well as a reduction in variability around 
the trend line, suggesting that macroporosity may develop rapidly 
at the surface. It is likely that multiple mechanisms act in concert on 
surface layers to produce this macroporosity; nonetheless, these pro- 
cesses appear to form macroporosity within years to decades!” and are 
therefore relevant for examining decadal-to-century-scale feedbacks 
between soil hydraulic properties, soil moisture and climate. 

We calculated changes in saturated hydraulic conductivities, K,, 
from current and predicted effective porosities using a generalized 
Kozeny-Carman equation’’ and end-of-century predictions of mean 
annual precipitation in the Coupled Model Intercomparison Project 
phase 5!” assuming a representative concentration pathway 6 (CMIP5 
RCP6) scenario for five regions in the USA that have been predicted to 
experience considerable changes in mean annual precipitation (Fig. 3). 
Our results show a wide range of expected changes to the magnitude of K,, 


6 SEPTEMBER 2018 | VOL 561 | NATURE | 101 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


a P<0.0001,n=461 p 


P<0.0001,n=471 ¢ 


P<0.0001,n=483 q P <0.0001, n = 483 


° o O° 
QO 


0.00 


0 °° 
0.15 J 6 Ap horizons 


Mean residual effective porosity 


i P< 0.0001, n = 623 


0.00 


0.15 7 © B horizons if 
0 500 1,000 1,500 10 20 30 40 5 10 15 0 50 150 250 
Mean annual Mean maximum JJA vapour Precipitation event Freezing 


precipitation (mm) pressure deficit (nPa) 


Fig. 2 | Climatological trends in mean residual effective porosity for 
natural surface, ploughed surface and subsurface layers. a—1, The mean 
residual effective porosity is negatively correlated with the mean annual 
precipitation (a, e, i) and the mean precipitation event magnitude (c, g, k), 
and positively correlated with the mean maximum vapour pressure deficit 
for June, July and August (JJA; b, f, j). The freezing frequency (d, h, |; in 


from —55% to +34%. With the exception of the Southeast Coastal 
Plain, mean predicted values were negative and ranged from —25% to 
—4%, signalling potentially less infiltration, more surface runoff and 
erosion and greater susceptibility to flash-flooding. 


magnitude (mm) frequency (yr) 


number of events per year) showed noticeable correlation with the mean 
residual effective porosity in the natural-surface layer only (d). Solid black 
(linear trends) and dashed blue (95% slope confidence intervals) lines are 
based on weighted linear regressions from 1,200 resampled data subsets. 

n indicates the size of the resampled data pool. Dashed black lines mark 
zero residual effective porosity for reference. 


Overall, our findings provide observational evidence that macro- 
porosity development is influenced by climate. This influence has not 
been previously considered in land-atmosphere forecasting, and it 
reinforces the hypothesis that climate change will probably intensify 
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Fig. 3 | Expected per cent deviation of surface-layer (A horizon) 
saturated hydraulic conductivity from current values by the end of 
the century (2081-2100) for several regions in the USA. a-f, Regions 
presented are: Northern Great Plains (NGP; a), Southern Great Plains 
(SGP; b), Basin and Range (BR; c), Pacific Northwest (PN; d) and 
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Southern Coastal Plain (SCP; e). The inset map (f) shows the locations of 
these regions. The largest mean magnitude changes are seen in PN, BR and 
SCP, in response to precipitation changes in these regions. The direction of 
the mean change is negative except in SCP. 
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the water cycle”’. Increases in mean annual precipitation and event 
magnitude and decreases in atmospheric dryness and freezing temper- 
ature frequencies appear to minimize the expression of macroporosity. 
The rapid rate of macroporosity development for surface layers, along 
with its disproportionate effects on saturated hydraulic conductivity, 
may alter the distribution of soil moisture and affect related processes, 
such as evapotranspiration”!. Our results suggest that these feedbacks 
should be incorporated into land-atmosphere parameterizations of 
regional and global climate models to better understand and predict 
the continental-scale hydrological cycle. 


Online content 

Any methods, additional references, Nature Research reporting summaries, source 
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https://doi.org/10.1038/s41586-018-0463-x. 


Received: 11 July 2017; Accepted: 10 July 2018; 
Published online 5 September 2018. 


1. Beven, K. & Germann, P. Macropores and water flow in soils revisited. 
Wat. Resour. Res. 49, 3071-3092 (2013). 

2. Jarvis, N. J. A review of non-equilibrium water flow and solute transport in soil 
macropores: principles, controlling factors and consequences for water quality. 
Eur. J. Soil Sci. 58, 523-546 (2007). 

3. Janzen, H. H. et al. Global prospects rooted in soil science. Soil Sci. Soc. Am. J. 
75, 1-8 (2011). 

4. Watson, K. W. & Luxmoore, R. J. Estimating macroporosity in a forest watershed 
by use of a tension infiltrometer. Soil Sci. Soc. Am. J. 50, 578-582 (1986). 

5. Lawrence, D. M. et al. Parameterization improvements and functional and 
structural advances in version 4 of the Community Land Model. J. Adv. Model. 
Earth Syst. 3, M03001 (2011). 

6. Clark, M. P. et al. Improving the representation of hydrologic processes in Earth 
System Models. Wat. Resour. Res. 51, 5929-5956 (2015). 

7. Hassink, J. The capacity of soils to preserve organic C and N by their association 
with clay and silt particles. Plant Soi! 191, 77-87 (1997). 

8. Bronick, C. J. & Lal, R. Soil structure and management: a review. Geoderma 124, 
3-22 (2005). 

9. Taina, |. A., Heck, R. J., Deen, W. & Ma, E. Y. T. Quantification of freeze-thaw 
related structure in cultivated topsoils using X-ray computer tomography. 

Can. J. Soil Sci. 93, 533-553 (2013). 

10. Brimhall, G. H. et al. Deformational mass transport and invasive processes in 
soil evolution. Science 255, 695-702 (1992). 

11. Platt, B. F., Kolb, D. J., Kunhardt, C. G., Milo, S. P. & New, L. G. Burrowing through 
the literature: the impact of soil-disturbing vertebrates on physical and 
chemical properties of soil. Soi! Sci. 181, 175-191 (2016). 


LETTER 


12. Robinson, D. A. et al. Experimental evidence for drought induced alternative 

stable states of soil moisture. Sci. Rep. 6, 20018 (2016). 

13. Rawls, W. J., Giménez, D. & Grossman, R. Use of soil texture, bulk density, and 

slope of the water retention curve to predict saturated hydraulic conductivity. 

Trans. ASAE 41, 983-988 (1998). 

14. Nemes, A., Rawls, W. J. & Pachepsky, Y. A. Influence of organic matter on the 

estimation of saturated hydraulic conductivity. Soil Sci. Soc. Am. J. 69, 

1330-1337 (2005). 

15. Tilman, D. Global environmental impacts of agricultural expansion: the need for 

sustainable and efficient practices. Proc. Natl Acad. Sci. USA 96, 5995-6000 

(1999). 

16. Foley, J. A. et al. Global consequences of land use. Science 309, 570-574 (2005). 

17. West, L. T., Shaw, J. N. & Mersiovsky, E. P. in The Soils of the USA (eds West, L. T. 

et al.) Ch. 13 (Springer, Cham, 2017). 

18. Peel, M.C., Finlayson, B. L. & McMahon, T. A. Updated world map of the 
Koppen-Geiger climate classification. Hydrol. Earth Syst. Sci. 11, 1633-1644 
(2007). 

9. Taylor, K. E., Stouffer, R. J. & Meehl, G. A. An overview of CMIP5 and the 
experimental design. Bull. Am. Meteorol. Soc. 93, 485-498 (2012). 

20. Huntington, T. G. Evidence for intensification of the global water cycle: review 

and synthesis. J. Hydrol. 319, 83-95 (2006). 
21. Jung, M. et al. Recent decline in the global land evapotranspiration trend due to 
limited moisture supply. Nature 467, 951-954 (2010). 


Acknowledgements D.R.H. and D.G. thank R. Miskewitz for assistance in 
assigning KOppen-Geiger classes to the samples in the dataset. A.N., D.G. and 
D.R.H. thank the Norwegian Institute of Bioeconomy Research (NIBIO) for 
financial support. N.A.B. acknowledges funding support through USDA-AFRI 
2014-67003-22070. 


Reviewer information Nature thanks P. Hallett, D. Robinson and the other 
anonymous reviewer(s) for their contribution to the peer review of this work. 


Author contributions D.R.H., D.G., A.N. and N.A.B. designed the study 
examining effective porosity with climate. A.N. and D.R.H. compiled the USDA- 
NRCS NCSS data. N.A.B. and C.J.W. compiled and analysed the atmospheric 
data. D.R.H. wrote the first draft of the paper and, with R.K., conducted the 
statistical analyses. All authors edited and commented on the manuscript and 
contributed to later iterations. 


Competing interests The authors declare no competing interests. 


Additional information 

Extended data is available for this paper at https://doi.org/10.1038/s41586- 
018-0463-x. 

Reprints and permissions information is available at http://www.nature.com/ 
reprints. 

Correspondence and requests for materials should be addressed to D.R.H. 
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional 
claims in published maps and institutional affiliations. 


6 SEPTEMBER 2018 | VOL 561 | NATURE | 103 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


METHODS 


Soil data. Soil information was selected from samples measured for sand 
(50-2,000 jum), silt (2-50 jum) and clay (less than 2 jum) contents, organic carbon 
content, field-capacity bulk density, air-dried bulk density and gravimetric water 
content at field capacity (—33 kPa) in the USDA-NRCS NCSS Characterization 
Database (https://ncsslabdatamart.sc.egov.usda.gov). In addition, only samples 
with recorded sampling locations were chosen. Samples representing surface 
layers were chosen if the midpoint of the soil morphological horizon (that is, the 
sampling interval) was within 25 cm of the land surface; subsurface layers were 
taken as those samples with midpoint depths between 25 cm and 100 cm of the 
land surface. A summary of the number of samples, names of the morphological 
horizons considered and depths are shown in Extended Data Table 1. We are 
unaware of any other soil database of this methodological consistency, extent 
and sample density, with soil physical properties measured directly as opposed 
to predicted via pedotransfer functions. Because this database contains relatively 
few samples in other areas of the world, we have restricted our analysis to within 
the USA. 

Calculation of effective porosity (EP) and residual EP (REP). Total porosity 
(d) was calculated using the bulk density of samples equilibrated to a pressure 
potential of py, = —33 kPa: 


Pre 
4 


g=1- 


where pp is the average density of the soil particles, which is frequently assumed 
to be 2.65 Mg m °. The bulk density was also used to convert the gravimetric 
soil water content (4) of samples equilibrated to —33 kPa to volumetric water 
content (6;.): 

f= Dee 
=e 

Ww 

where py is the density of water, assumed to be 1 Mg m~™*. Effective porosity was 
calculated as the difference between ¢ and 6;. corresponding to pore sizes larger 
than 9 jum in diameter. 

To account for possible effects that could arise from differences in particle-size 
distribution, soil organic matter and spatial location, EP was predicted (EP) using 
a spatial error-regression model. The form of the regression equation linking EP 
to soil properties is 


EP =Cy+ Cif, + Cafe + Cafsoc + Cahhe + Csfelsoc a) 
+Cofsfsoc + Crfsfefsoc + € 


where f,, fc and fsoc are the mass fractions of sand, clay and soil organic carbon, 
respectively, C; is the ith coefficient of the multiple regression model and € denotes 
spatially correlated errors between neighbouring observations”. We used equa- 
tion (1) to account for both direct effects on EP from sand, clay and soil organic 
carbon and effects that arise from their interactions for surface (A horizons) and 
subsurface (B horizons) layers separately. 

Equation (1) is a multiple regression model, which can be represented using 
standard notation (simplified to one independent variable) as 


y=Xb+e (2) 


where y is the dependent variable, (3 is the coefficient for the explanatory or inde- 
pendent variable X and ¢ represents an error vector. The form of the regression 
used had a spatial autoregressive process in the error term ¢ as follows”*: 


Ee=AWeE+E 


Equation (2) can then be re-expressed as 


y=XB+(I-\wy lé 


where / is the spatial autoregressive coefficient for the error lag, We, W is a 
distance-based spatial weights matrix, € is an uncorrelated and homoscedastic error 
term, and I is the identity matrix. Thus, the error covariance becomes 


Elee'] =o0?(I-AW) 'U—AW’)! 


2 -1 (3) 
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A spatial autoregressive error process leads to non-zero error covariances 
between every pair of observations. The error covariances decrease in magnitude 
with separation distance as the errors covary less owing to spatial autocorrelation 
when the separation distance is larger. The complex structure in the inverse matrices 
in equation (3) yields non-constant diagonal elements in the error covariance 
matrix, thus inducing heteroscedasticity in <”. 


REP was calculated as the difference between the EP calculated from measure- 
ments of p;- and 6. and EP calculated using equation (1): 


REP = EP—EP (4) 


REP expresses values of EP that are independent of particle size, soil organic carbon 
and spatial location and can therefore be used to investigate the effect of climate 
on EP. 

Calculation of saturated hydraulic conductivity. The slope of the water reten- 
tion curve (D) between ;- and the water content at the wilting point (yp) was 
calculated as 


p= log(Oyp/ Ae) 
~ | log (1,500/33) 


where 1,500 and 33 represent the absolute values of the pressure potentials at the 
wilting point and field capacity, respectively, in kilopascals!. The saturated hydraulic 
conductivity (in units of millimetres per hour) was calculated from D as 


K,=1,930EP*-? 


where 1,930 is an empirically determined coefficient. 

Climatological data. Daily precipitation and minimum and maximum temperature 
data were obtained from all USHCN stations that had a continuous record between 
1951 and 20117“. Precipitation data were used to calculate mean annual precipitation 
for each station. In addition, mean event magnitude and timing were fitted to precip- 
itation data assuming exponential and Poisson distributions, respectively”*. For each 
station, temperature data were used to calculate the freezing frequency as the mean 
number of times per year that the atmospheric temperature dropped below 0 °C. 
Mean maximum daily vapour pressure deficit values were obtained from the PRISM 
gridded climate data”®. Predictions of end-of-century mean annual precipitation 
were obtained from the CMIP5”” multi-model ensemble assuming a RCP6 scenario. 
Statistical analysis. Effective porosity and climate point data were interpolated onto 
a1° x 0.5° (approximately 85 km x 55 km) grid across continental USA using a tricubic- 
weighted least-squares quadratic trend surface”” to smooth and visualize the spa- 
tial patterns in EP, REP and precipitation event magnitude and timing in Fig. 1 and 
Extended Data Fig. 2. Variograms of EP were calculated for each layer; REP values 
were calculated from equations (1) and (4) using a distance-based spatial-weight set 
based on the range parameter of the variograms to determine the spatially correlated 
error. The REP from soil samples with locations closest to each USHCN station were 
averaged and paired with the atmospheric data from each station. This aggregation 
procedure reintroduced some spatial autocorrelation; thus, we computed variograms 
to determine the scale of spatial variation in the aggregated data for each layer. We used 
a resampling method to randomly select subsamples of the aggregated data that were 
spaced at intervals that were at least twice the variogram range (600 km, 200 km and 
840 km for the A, Ap and B horizons, respectively) to ensure spatial independence. 
This resampling method produced subsets of aggregated observations separated by 
distances greater than the intervals mentioned above. Weighted ordinary least-squares 
regression (that is, weighted by the number of samples averaged to represent each aggre- 
gated observation) was run on each subset and the presence of spatial autocorrelation 
in the residuals checked using Moran's I. When significant spatial auto-correlation 
was identified, that random subset was discarded and another selected until a total of 
1,200 subsets was reached for each layer. Outliers in each subset were detected using 
an adjusted Mahalanobis distance”* and removed prior to the regression analysis. 
Relationships between climatological variables and mean REP were examined using 
weighted ordinary least-squares regression for each of the 1,200 random subsets. The 
lines in Fig. 2 represent the average of these 1,200 regressions whereas the points rep- 
resent the pool of samples from which the random subsets were selected. Spatial error 
regression was performed in GeoDa”. All other analyses were performed using R”’. 
Code availability. The R scripts used to generate and analyse the data are publicly 
available in the GitHub repository, https://github.com/danielhirmas/nature2017- 
07-09186B. 

Data availability. The soil and climatological datasets generated and analysed 
during this study are publicly available in the GitHub repository, https://github. 
com/danielhirmas/nature2017-07-09186B. The soil datasets used in this study 
are also publicly available through the USDA-NRCS NCSS data repository, http:// 
ncsslabdatamart.sc.egov.usda.gov/. 
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Extended Data Fig. 1 | Distribution of selected soil samples and USHCN _ Soil sample data were obtained from the NCSS Characterization Database 
weather stations used in this study. a—d, Locations of A horizons (a), on 10 July 2013. Depth and soil morphological criteria for selection of the 
B horizons (b), Ap horizons (c) and USHCN weather stations (d). Weather A, Ap and B horizon samples are given in Extended Data Table 1. 

stations that recorded continuous data since at least 1951 were selected. 
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Extended Data Fig. 2 | Interpolated maps of mean precipitation 
magnitude per event and mean precipitation event timing from 
USHCN weather station data. a, Interpolated map of the mean 
precipitation magnitude per event (PM; in millimetres), calculated 
assuming that the magnitude of precipitation events followed an 
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exponential distribution. b, Interpolated map of the mean precipitation 
event timing (PT; in events per day), calculated from a Poisson 
distribution for days with a precipitation event. Weather stations that 
recorded continuous data since at least 1951 were selected. 
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Extended Data Table 1 | Data selection criteria for samples used in 


this study 
Fraction 
Depth* of totaly Representative 
Soil layer (cm) n (%) horizons 
A horizon 0-25 3582 §=444.8 A, Ac, Ag, Ass 
Aphorizon 0-25 3248 40.6 Ap, Apc, Apg, Apt, ABp 
B horizon 25-100 5756 37.6 B, Bw, Bt, Bg, Bss, Bec, Btss, Big, 


Btc, Btm, Bwg, Bwc, Bssg 


*Only samples with a horizon midpoint depth within the specified depth intervals were selected. 
tThe fraction of the total is given as the percentage of all A or B horizons in the database with 
the necessary data (location, bulk density, water contents at pressure potentials of -33 kPa and 
—1,500 kPa, sand, clay and organic carbon), falling within the respective depth intervals and in 
the conterminous USA. 

{Morphological horizon designations were selected to reduce effects from local addition of 
material (for example, carbonate) to the surface, from disturbance by management practices 
and from lithologic discontinuities. Symbols represent: c, concretions or nodules; g, strong gley 
colours; m, continuous pedogenic cementation; p, ploughed layer; ss, slickensides; t, illuvial 
accumulation of silicate clay; and w, weak colour or structure within the B horizon. 
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Jurassic stem-mammal perinates and the origin of 
mammalian reproduction and growth 


Eva A. Hoffman!* & Timothy B. Rowe!*# 


Transformations in morphology, physiology and behaviour 
along the mammalian stem lineage were accompanied by 
profound modifications to reproduction and growth, including 
the emergence of a reproductive strategy characterized by high 
maternal investment in a small number of offspring!” and 
heterochronic changes in early cranial development associated 
with the enlargement of the brain®. Because direct fossil evidence 
of these transitions is lacking, the timing and sequence of these 
modifications are unknown. Here we present what is, to our 
knowledge, the first fossil record of pre- or near-hatching young 
of any non-mammalian synapsid. A large clutch of well-preserved 
perinates of the tritylodontid Kayentatherium wellesi (Cynodontia, 
Mammaliamorpha) was found with a presumed maternal skeleton 
in Early Jurassic sediments of the Kayenta Formation. The single 
clutch comprises at least 38 individuals, well outside the range 
of litter sizes documented in extant mammals. This discovery 
confirms that production of high numbers of offspring represents 
the ancestral condition for amniotes, and also constrains the timing 
of a reduction in clutch size along the mammalian stem. Although 
tiny, the perinates have an overall skull shape that is similar to 
that of adults, with no allometric lengthening of the face during 
ontogeny. The only positive allometries are associated with the 
bones that support the masticatory musculature. Kayentatherium 
diverged just before a hypothesized pulse of brain expansion that 
reorganized cranial architecture at the base of Mammaliaformes*~. 
The association of a high number of offspring and largely isometric 
cranial growth in Kayentatherium is consistent with a scenario in 
which encephalization—and attendant shifts in metabolism and 
development”*— drove later changes to mammalian reproduction. 

The new specimens were discovered in the matrix underlying the 
partial skeleton of a Kayentatherium adult recovered from the Early 
Jurassic Kayenta Formation of northeastern Arizona. Remains of 
numerous tiny juveniles were found within a perimeter formed by the 
ribs, forelimb elements and vertebrae of the adult, which also preserves 
teeth and fragmentary jaws (Figs. 1, 2e-f, Extended Data Figs. 1, 2, 
Extended Data Table 1a, b, Supplementary Video 1). The young are 
represented by ten mostly complete skulls, as well as isolated jaws, 
teeth and postcranial elements (Fig. 2a-d, Extended Data Table la, c, 
Supplementary Videos 2, 3). The immature bones are thin and diaph- 
anous, yet well-preserved. Distortion other than lateral compression 
is minimal, and positional relationships within the skull and among 
regions of the body are often maintained. Although three tritylodontid 
species are reported from the Kayenta Formation, dental features con- 
firm the identification of adult and young as K. wellesi?~'! (Fig. 2c-g, 
Supplementary Videos 4, 5). 

Because almost no variation can be detected in size, tooth development 
or degree of ossification, the young are interpreted as belonging to a single 
clutch, presumably of the associated adult. Clutch size was estimated con- 
servatively as one-half of the number of dentaries visible by microscope 
inspection or in micro-computed tomography scans (Supplementary 
Information); a partial dentary was counted if it preserved two erupted 


molariforms (Extended Data Table 1a, Supplementary Table 1). The min- 
imum clutch size obtained by this method was 38, which is more than 
twice the average litter size of any mammal but is similar to average clutch 
size in crocodilians and a few squamates (Fig. 3, Supplementary Table 2). 
The discovery of a large clutch in a stem mammal provides material 
evidence that producing high numbers of offspring is the ancestral 
condition for amniotes, and that small litters represent a derived mammalian 
trait”. There is no sign of eggshell with the new specimens—although 
the shell, if it was similar to that of monotremes and lepidosaurs, would 
have been leathery and unlikely to fossilize!*!7. The skull length of 
the Kayentatherium young is 1/20 that of a large adult (Museum of 
Comparative Zoology (MCZ), specimen number 8812), and 1/10 that 
of a small adult (MCZ 8811), which on the basis of dental widths was 
probably similar in size to the presumed mother (Fig. 2c—g, Extended 
Data Table 1d). The relative tininess of the individuals, the absence of 
discernible tooth wear (Fig. 2c, d) and the uniformity of ontogenetic stage 
(which indicates that there was little competition among the offspring 
for resources") all suggest that the young were newly hatched—if not 
embryos. We refer to them neutrally as perinates. 

Although aggregations of juvenile stem mammals have occasion- 
ally been described previously, the smallest young reported thus far 
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Fig. 1 | Volumetric rendering of original field jacket with excavated 
sediment containing adult and perinatal Kayentatherium remains 
(Vertebrate Paleontology Laboratory (TMM) 43690-5). 

See Supplementary Information for details on catalogue numbers and 
computed tomography scanning. Adult bones that had weathered to the 
surface are not shown. Stars mark the positions of selected perinates, 
illustrated in Extended Data Fig. 2. Yellow stars indicate perinatal 
individuals that preserve paired dentaries and that were counted towards 
our census; green stars indicate parts of perinatal individuals that were 
not counted towards our census (or counted as one-half; Extended Data 
Table 1a, Supplementary Table 1). Because stars are larger than perinatal 
bones, the positions of the stars are approximate. ph, phalanges. 


1 Jackson School of Geosciences, The University of Texas at Austin, Austin, TX, USA. 2The University of Texas High-Resolution X-ray Computed Tomography Facility, The University of Texas at Austin, 
Austin, TX, USA. "Vertebrate Paleontology Laboratory, The University of Texas at Austin, Austin, TX, USA. “e-mail: eva.hoffman@utexas.edu 


104 | NATURE | VOL 561 | 6 SEPTEMBER 2018 


© 2018 Springer Nature Limited. All rights reserved. 


Fig. 2 | Skull of representative perinate, and dental anatomy of 
perinatal and adult Kayentatherium. a, b, Perinatal skull (TMM 
43690-5.035a) and coronal sections at indicated levels (I-IV). The right 
side of the skull is shown reversed. For simplicity, the single upper and 
lower replacement incisors of Kayentatherium are designated I' and 

I,, and their replacements I'r and I;r (but see previous studies!”!"), 

c, d, Teeth of perinate (TMM 43690-5.017a). c, Right anteriormost 
lower molariform (Mp) in lingual and occlusal views, and left lower 


are greater than one-fourth of adult size, and therefore well beyond 
the perinatal period!®'°. By contrast, the new specimens provide 
fossil evidence regarding very early development in non-mammalian 
synapsids. Ontogenetic series of Sphenodon punctatus'”' (Fig. 4a—c) 
and Monodelphis domestica (Fig. 4d—g) are shown for comparison 
with Kayentatherium (Fig. 4h-j). In absolute size, the Kayentatherium 
perinates are similar to Sphenodon hatchlings and early Monodelphis 
pouch young (postnatal day 27; Fig. 4k). 

The perinatal and adult skulls of Kayentatherium are notably similar 
in overall form. The growth of the face with respect to the braincase in 
reptiles and mammals is illustrated in Fig. 4. Whereas crocodilians and 
birds have autapomorphic extensions of the face’®, lepidosaurs retain the 
primitive amniote condition in which the skull grows mainly in isom- 
etry, with relative facial length constant during ontogeny (Fig. 4a-c). 
In mammals, the early skull accommodates a relatively much larger 
brain’, with the result that embryos (or marsupial pouch young) have 
inflated braincases and short faces. Later in ontogeny, the expansion 
of the ethmoid olfactory skeleton, secondary palate and dentition 
causes the mammalian face to lengthen markedly as a proportion of 
the skull’?! (Fig. 4d-g). 

In Kayentatherium skull ontogeny is plesiomorphic, with a relative 
facial length (measured from the zygomatic root to the tip of the pre- 
maxilla) that remains unchanged from perinate to large adult (Fig. 4h-j). 
As a formal test of isometry, we performed a standard-major-axis 
regression of facial length on skull length, a common proxy for overall 
body size?*. The coefficient of allometry, b, equals 1.00, indicating 
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molariforms in labial view. d, Left upper molariform in labial (oblique) 
and occlusal views. e-g, Teeth of adult. e, Left lower molariforms in 
labial view. f, Upper molariform in labial (oblique) and occlusal views. 

g, Broken left dentary showing five lower molariforms in occlusal view. 
The distalmost molariform is unerupted. de, dentary; fr, frontal; Iyr, 
replacement lower incisor; Ty, replacement upper incisor; ju, jugal; Mo- 
M3, lower molariforms 0-3; M!, M’, upper molariforms 1, 2; na, nasal; os, 
orbitosphenoid; pal, palatine; par, parietal; pt, pterygoid; sq, squamosal. 


that growth of the face is isometric (Extended Data Fig. 3, Extended 
Data Table 1d). 

As in reptile ontogeny, positive allometries in Kayentatherium ontog- 
eny mostly involve maturation of the masticatory system. During 
growth, the zygoma deepens and bows outward, a tall sagittal crest 
develops, the pterygoid transverse process lengthens and widens, and 
muscle attachments to the dentary expand and subtly reshape the 
coronoid process (Extended Data Fig. 4). These changes all reflect mature 
attainment of large, forceful masticatory muscles”. In our dataset, 
the maximum height of the zygomatic arch, length of the transverse 
process and width of the transverse process increase with positive 
allometry (coefficient of allometry or slope of log-log plot >1). Other 
skull measurements show isometry (Extended Data Fig. 3, Extended 
Data Table 1d). 

The endocranial spaces and delicate bones of the palate, braincase 
and skull roof are visible in micro-computed tomography cross-sections 
of a perinatal skull (Fig. 2a, b, Supplementary Videos 6-8). Inferred 
brain shape reflects both phylogeny and ontogeny. The orbitosphenoids 
are high, as in other non-mammaliaform cynodonts, which indicates 
that the anterior forebrain is small®. However, the parietals preserve 
impressions of cerebral hemispheres that are tall domes separated by a 
deep interhemispheric sulcus (Fig. 2b), a condition that is not observed 
in more basal cynodonts. In the Kayentatherium perinates, the brain 
is wide relative to the rest of the skull, but later in development the 
brain becomes relatively narrow (Fig. 4h-j). Apart from the domed, 
separated cerebral hemispheres—which persist into adulthood—adult 
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Fig. 3 | Clutch size of Kayentatherium and other amniotes, and 
phylogenetic position of Kayentatherium. a, Graph of average (mean or 
midpoint) clutch or litter size in 687 polytocous species of extant reptiles 
and mammals. The minimum size of the fossil clutch described here is 
indicated. b, Log-log plot of total clutch or litter mass versus adult mass 
(see Supplementary Information for explanation of body-mass estimation 
for adult and perinatal Kayentatherium). Data and sources can be found 
in Supplementary Table 2. c, Phylogeny of advanced cynodonts; relative 
phylogenetic positions of the included taxa are taken from previously 
published phylogenies*®. Daggers indicate fossil taxa. Dotted lines indicate 
the two likeliest possibilities for the phylogenetic position of Pachygenelus 
with respect to Kayentatherium. 


brain proportions are more similar to the tubular brains of more basal 
cynodonts than to the large, sub-spherical brains of mammals®”. 

The teeth of the perinates are few in number and large relative to the 
jaws (Extended Data Fig. 4). Each complete jaw—upper and lower— 
contains an incisor and its replacement, two erupted molariforms and 
the cusps of two unerupted molariforms (Fig. 2a, Extended Data Fig. 5). 
The slender postdentary elements are distinguishable in the postdentary 
trough, and even the reflected lamina is thin and fragile but unbroken, 
with a pointed distal end (Extended Data Fig. 6). 

Owing to their characteristic interlocking design, the teeth are com- 
monly preserved in occlusion (Fig. 2a, b). Molariforms 1-4 are always 
present, and an additional, diminutive lower molariform is preserved 
in a single specimen on the left and right sides (Fig. 2c). No similar 
tooth has previously been reported in a tritylodontid. This tooth (Mo) 
is displaced on the right but preserved in situ on the left, anterior to 
the first pair of occluding molariforms. Its length is barely half that of 
the succeeding lower molariform (Mj) and its shape is unique, with 
three major cusps rather than four. The molariforms of tritylodontids 
are added in conveyor-belt fashion from the back of the tooth row, so 
that the anteriormost teeth are the oldest; teeth are eventually lost from 
the front of the jaw, opening a long diastema'!”°. The small size and 
anomalous cusp configuration of Mo suggest that it developed very 
early in ontogeny. 

Distal to Mo, a strong mesiodistal size gradient is present along the 
tooth row and reflects the ongoing growth of the jaw'®')° (Fig. 2c, 
Extended Data Fig. 5). If the rate of tooth formation was constant, the 
disproportionate size gap between My and M, provides some evidence 
of faster growth earlier in development. On the basis of wear patterns, 
previous authors have proposed that tritylodontids had a delayed 


Fig. 4 | Early cranial ontogeny of reptiles, non-mammalian cynodonts 
and mammals. a—j, Ontogenetic series of S. punctatus (a-c; hatchling, 
Carnegie Museum 20660; small adult!”'®, Queen Mary University of 
London 0614; large adult, Yale Peabody Museum 9194), M. domestica 
(d-g; postnatal day 15, TMM M-7659; postnatal day 27, TMM M-8262; 
postnatal day 48, TMM M-7536; adult, TMM M-7599) and K. wellesi 
(h-j; perinate, TMM 43690-5.035a; small adult, MCZ 8811; large adult, 
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MCZ 8812). The colour of the bar corresponds to the length of facial 
region, from the zygomatic root to the tip of the snout, relative to earliest 
ontogenetic stage. Skulls of TMM 43690-5.035a and MCZ 8812 are 
laterally compressed. k, Sphenodon hatchling, Kayentatherium perinate 
and Monodelphis pouch young (postnatal day 27), shown at equal 
magnification. 
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Fig. 5 | Manus and limb elements of Kayentatherium perinates. 

a, b, Partial left hand and wrist (TMM 43690-5.032a) in original position 
(a) and in dorsal view (b). c, d, Humerus (c; TMM 43690-5.032a) and its 
distal articular surface (d). e, f, Femur (e; TMM 43690-5.013a) and its 
distal articular surface (f). mcl, first metacarpal; mc5, fifth metacarpal. 


onset of tooth development, with the implication that offspring must 
have required adult provisioning”®. We show that tooth eruption in 
Kayentatherium began early in ontogeny. However, the single mesial 
cusp of My seems almost certain to have inhibited occlusion, at least in 
the propalinal tritylodontid style'’”’; therefore it appears unlikely that 
this early-generation molariform could have functioned in chewing. 

In all the perinates, lower molariforms are better developed than 
upper molariforms. The first occluding lower molariform (Mj) always 
has two fully developed roots; the roots of Mz are partially developed; 
and M; and My are rootless, unconsolidated cusps. A similar pattern 
holds for the upper molariforms, although their roots are generally less 
distinct (Fig. 2c, d, Extended Data Fig. 5, Supplementary Videos 4, 5). 
In tritylodontids and haramiyids, successive tooth eruption—in con- 
junction with the rapid growth and elongation of the dentary—results 
in ‘mesial drift’ of the molariform crowns after root formation, so that 
over time the crowns are pushed forward and the implanted roots curve 
posteriorly*®”° (Fig. 2e). Roots of the perinates are straight rather than 
curved, consistent with their immaturity. The anteriormost molariform 
(Mo) appears to have a single root (Fig. 2c), which on the left side either 
broke off post-mortem or was resorbed as the jaw grew and newer teeth 
moved forward. That the loss of Mp was imminent, and its connection 
to the dentary tenuous, may explain the absence of that tooth in most 
of the specimens. 

Postcrania of the perinates include elements of both forelimbs 
and hindlimbs. Examination of micro-computed tomography scans 
revealed a semi-articulated hand and wrist curled beneath the jaws of 
a partial skull (Fig. 5a, b, Extended Data Fig. 7). The carpals are ossified 
disks that have yet to take on mature shape and complex articulations. 
Their presence is notable, given that wrist elements tend to develop 
relatively late across tetrapods*”. Notable, too, are the expanded proxi- 
mal and distal ends and complex articular surfaces of the metacarpals 
and phalanges—features not developed in similar-sized mammals*! 
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or reptiles****, The limb bones of the perinates likewise exhibit a 
relatively mature shape. For example, the humerus has epicondyles 
and an incipient capitulum and ulnar condyle (Fig. 5c, d, Extended 
Data Fig. 8). These features reflect the epigenetic responsiveness of 
the developing bone to mechanical loading by muscle contraction, 
which begins before hatching*. The length of the perinatal femur is 
83% that of the humerus (4.8 versus 5.8 mm), compared with a roughly 
even ratio in adults*®. The presence of a reasonably large femur in the 
Kayentatherium perinates suggests that only a slight delay in hindlimb 
ossification, ancestral for tetrapods, is maintained on the mammalian 
stem lineage and by inference present primitively in mammals*’. 

Limb elements in Kayentatherium may reflect a transitional phase 
in the histomorphological evolution of endochondral ossification, in 
association with changes in growth rate relative to basal therapsids*” 
and the acquisition of numerous derived features of the postcranial 
skeleton that persist in extant mammals*. In basal tetrapods and reptiles 
(except lepidosaurs), perichondral ossification outpaces endochondral 
ossification as both advance from the shaft towards the ends of each 
limb bone*». Proximally and distally, perichondral bone forms around 
still-cartilaginous ‘cones, which only later calcify and then ossify to 
form the endochondral articular ends. In mammals, endochondral 
ossification proceeds at the same rate as perichondral ossification. 
Dense cancellous endochondral bone is deposited at the proximal and 
distal ends simultaneously with the perichondral bone forming the 
surrounding collar; no cartilage cone ever develops. Secondary ossifi- 
cation centres subsequently appear in the cartilaginous epiphyses. In 
the perinates, both ends of the humerus and femur have short zones 
of cancellous bone that are denser than the perichondral shafts, indi- 
cating that endochondral ossification is relatively advanced—a condi- 
tion approaching that of mammals (Fig. 5c-f, Extended Data Fig. 8). 
However, the ossified ends appear to have grown around cartilage 
cones, as in ancestral amniotes: the dense cancellous bone encloses 
conical indentations in the proximal end of the humerus and both ends 
of the femur (Fig. 5c-f, Extended Data Fig. 8). Secondary ossifications 
are absent. In mammals, endochondral ossification proceeds at an even 
greater rate, and cartilage cones no longer form, with secondary ossifi- 
cation centres taking their place*”®. 

The new specimens show that tritylodontids retained a primitive 
pattern of reproduction despite sharing a number of derived skeletal 
features with mammals*”>®. The association of a plesiomorphic large 
clutch size and isometric cranial growth in a basal mammaliamorph 
suggests a link between reproductive strategy and brain size’””*. 
However, the first mammaliamorphs may have achieved more balanced 
rates of perichondral and endochondral ossification, in association 
with a postcranial skeleton that now resembled that of mammals more 
closely than that of basal cynodonts*. The origin of Mammaliaformes 
coincided with a 50% increase in relative brain size and emergence 
of the neocortex®*?. With the origin of crown Mammalia, further 
brain enlargement was unequivocally associated with reduced clutch 
size. The exceptional preservation of a large clutch of Kayentatherium 
perinates confirms a close correspondence among these features, 
and provides a more nuanced historical sequence in the evolution of 
mammalian characters. 


Online content 

Any methods, additional references, Nature Research reporting summaries, source 
data, statements of data availability and associated accession codes are available at 
https://doi.org/10.1038/s41586-018-0441-3. 


Received: 14 March 2018; Accepted: 17 July 2018; 
Published online 29 August 2018. 


1. Hopson, J. A. Endothermy, small size, and the origin of mammalian 
reproduction. Am. Nat. 107, 446-452 (1973). 

2. Case, T. J. On the evolution and significance of postnatal growth rates in 
vertebrates. Q. Rev. Biol. 53, 243-282 (1978). 

3. Koyabu, D. et al. Mammalian skull heterochrony reveals modular evolution and 
a link between cranial development and brain size. Nat. Commun. 5, 3625 
(2014). 


6 SEPTEMBER 2018 | VOL 561 | NATURE | 107 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


4. Rowe, T. B. Definition, diagnosis, and origin of Mammalia. J. Vert. Paleont. 8, 

241-264 (1988). 

5. Liu, J. & Olsen, P. The phylogenetic relationships of Eucynodontia (Amniota: 

Synapsida). J. Mamm. Evol. 17, 151-176 (2010). 

6. Rowe, T. B., Macrini, T. E. & Luo, Z.-X. Fossil evidence on origin of the mammalian 

brain. Science 332, 955-957 (2011). 

7. Sacher, G. A. & Staffeldt, E. F. Relation of gestation time to brain weight for 

placental mammals: implications for the theory of vertebrate growth. Am. Nat. 

108, 593-615 (1974). 

8. artin, R. D. Relative brain size and basal metabolic rate in terrestrial 
vertebrates. Nature 293, 57-60 (1981). 

9. Kermack, D. M. A new tritylodontid from the Kayenta Formation of Arizona. 

Zool. J. Linn. Soc. 76, 1-17 (1982). 

0. Sues, H.-D. First record of the tritylodontid Oligokyphus (Synapsida) from the 
Lower Jurassic of western North America. J. Vert. Paleont. 5, 328-335 
(1985). 

1. Sues, H.-D. Skull and dentition of two tritylodontid synapsids from the Lower 
Jurassic of western North America. Bull. Mus. Comp. Zool. 151, 217-268 
(1986). 

2. Hill, J. P. V. The development of the Monotremata. Part Il. The structure of the 
egg-shell. Trans. Zool. Soc. Lond. 24, 443-456 (1933). 

3. Sander, P.M. Reproduction in early amniotes. Science 337, 806-808 (2012). 

4. Mock, D. W. & Parker, G. A. The Evolution of Sibling Rivalry (Oxford Univ. Press, 
Oxford, 1998). 

5. Sanchez-Villagra, M. R. Developmental palaeontology in synapsids: the fossil 
record of ontogeny in mammals and their closest relatives. Proc. R. Soc. Lond. B 
277, 1139-1147 (2010). 

6. Jasinoski, S.C. & Abdala, F. Aggregations and parental care in the Early Triassic 
basal cynodonts Galesaurus planiceps and Thrinaxodon liorhinus. Peer/ 5, e2875 
(2017). 

7. Regnault, S., Hutchinson, J. R. & Jones, M. E. H. Sesamoid bones in tuatara 
(Sphenodon punctatus) investigated with X-ray microtomography, and 
implications for sesamoid evolution in Lepidosauria. J. Morphol. 278, 62-72 
(2017). 

8. Regnault, S. & Hutchinson, J. R. Sesamoid bones in tuatara. Open Science 
Framework https://osf.io/bds35/ (2017). 

9. Young, N. M. et al. Embryonic bauplans and the developmental origins of facial 
diversity and constraint. Development 141, 1059-1063 (2014). 

20. Wealthall, R. J. & Herring, S. W. Endochondral ossification of the mouse nasal 

septum. Anat. Rec. 288A, 1163-1172 (2006). 

21. Cardini, A. & Polly, P. D. Larger mammals have longer faces because of 
size-related constraints on skull form. Nat. Commun. 4, 2458 (2013). 

22. Jasinoski, S. C., Abdala, F. & Fernandez, V. Ontogeny of the Early Triassic 
cynodont Thrinaxodon liorhinus (Therapsida): cranial morphology. Anat. Rec. 
(Hoboken) 298, 1440-1464 (2015). 

23. Crompton, A. W. & Parker, P. Evolution of the mammalian masticatory 
apparatus. Am. Sci. 66, 192-201 (1978). 

24. Rowe, T. Coevolution of the mammalian middle ear and neocortex. Science 273, 
651-654 (1996). 

25. Kuhne, W. G. The Liassic therapsid Oligokyphus (British Museum, London, 1956). 

26. Hu, Y., Meng, J. & Clark, J. M. A new tritylodontid from the Upper Jurassic of 
Xinjiang, China. Acta Palaeontol. Pol. 54, 385-391 (2009). 

27. Crompton, A. W. Postcanine occlusion in cynodonts and tritylodontids. Bull. Br. 
Mus. 21, 27-71 (1972). 

28. Cui, G. & Sun, A. Postcanine root system in tritylodonts. Vertebrata Palasiatica 
25, 245-259 (1987). 

29. Luo, Z.-X., Gatesy, S. M., Jenkins, F. A. Jr, Amaral, W. W. & Shubin, N. H. 
Mandibular and dental characteristics of Late Triassic mammaliaform 
Haramiyavia and their ramifications for basal mammal evolution. Proc. Natl 
Acad. Sci. USA 112, E7101-E7109 (2015). 


108 | NATURE | VOL 561 | 6 SEPTEMBER 2018 


30. Richardson, M. K. et al. Heterochrony in limb evolution: developmental 
mechanisms and natural selection. J. Exp. Biol. 312B, 639-664 (2009). 

31. Weisbecker, V. Monotreme ossification sequences and the riddle of mammalian 
skeletal development. Evolution 65, 1323-1335 (2011). 

32. Rieppel, O. Studies on skeleton formation in reptiles. V. Patterns of ossification 
in the skeleton of Alligator mississippiensis Daudin (Reptilia, Crocodylia). 

Zool. J. Linn. Soc. 109, 301-325 (1993). 

33. Rieppel, O. Studies on skeleton formation in reptiles: patterns of ossification in 
the skeleton of Chelydra serpentina (Reptilia, Testudines). J. Zool. (Lond.) 231, 
487-509 (1993). 

34. Rieppel, O. Studies on skeleton formation in reptiles. |. The postembryonic 
development of the skeleton in Cyrtodactylus pubisulcus (Reptilia: Gekkonidae). 
J. Zool. (Lond.) 227, 87-100 (1992). 

35. Carter, D.R., Mikié, B. & Padian, K. Epigenetic mechanical factors in the 
evolution of long bone epiphyses. Zool. J. Linn. Soc. 123, 163-178 (1998). 

36. Sues, H.-D. & Jenkins, F. A. in Amniote Paleobiology: Perspectives on the Evolution 
of Mammals, Birds, and Reptiles (eds Carrano, M. T. et al.) 114-152 (Chicago 
Univ. Press, Chicago, 2006). 

37. Ray, S., Bandyopadhyay, S. & Bhawal, D. Growth patterns as deduced 
from bone microstructure of some selected neotherapsids with special 
emphasis on dicynodonts: phylogenetic implications. Palaeoworld 18, 
53-66 (2009). 

38. Kemp, T. S. The relationships of mammals. Zool. J. Linn. Soc. 77, 353-384 (1983). 

39. Rowe, T. B. in Evolution of Nervous Systems 2 Vol. 2 (ed. Kaas, J.) 1-52 
(Academic, Oxford, 2017). 


Acknowledgements We thank A. Zaman and B. Niessemeyer of the Navajo 
Nation Minerals Division for issuing the permit (dated 27 March 2000) under 
which this specimen was collected, and K. Calsoyas and T. Anderson for 
promoting our collaboration with the Navajo EcoScouts Program. Funding was 
provided by the National Science Foundation (EAR 1561622, IIS-9874781), 
by the Geology Foundation of The University of Texas and by the Jackson 
School of Geosciences. We thank B. Andres, S. Egberts, J. Franzosa, R. Gary, 

E. Gordon, T. Macrini, P. Owen, C. Sagebiel and R. S. Wallace for field, laboratory 
and curatorial assistance; M. Colbert, J. Maisano, J. Berlin and G. Rogers for 
computed tomography scanning the specimens described here; and 

S. Regnault, J. Hutchinson, C. Bell and digimorph.org for Sphenodon computed 
tomography scans. 


Reviewer information Nature thanks H. Sues, L. Wilson and the other 
anonymous reviewer(s) for their contribution to the peer review of this work. 


Author contributions T.B.R. directed collecting and preparation of the specimen. 
E.A.H. performed measurements, quantitative analyses and segmentation of 
computed tomography data, assembled comparative reproductive data and 
prepared the figures and tables. E.A.H. and T.B.R. wrote the manuscript and 
Supplementary Information. 


Competing interests The authors declare no competing interests. 


Additional information 

Extended data is available for this paper at https://doi.org/10.1038/s41586- 
018-0441-3. 

Supplementary information is available for this paper at https://doi.org/ 
10.1038/s41586-018-0441-3. 

Reprints and permissions information is available at http://www.nature.com/ 
reprints. 

Correspondence and requests for materials should be addressed to E.A.H. 
Publisher’s note: Springer Nature remains neutral with regard to jurisdictional 
claims in published maps and institutional affiliations. 


© 2018 Springer Nature Limited. All rights reserved. 


METHODS 

Collection and mechanical preparation methods are described in the 
Supplementary Information. Initial computed tomography scans were performed 
at the Austin Heart Hospital, and other scan data reported here were generated by 
The University of Texas High-Resolution X-ray Computed Tomography Facility 
(UTCT). Detailed scan parameters are available in the Supplementary Information. 
Segmentation was performed in VGStudio Max 2.1. Measurements were performed 
by placing indicators on 3D isosurfaces in VGStudio Max 2.1. Standard-major-axis 
regressions were performed in R” using the package smatr*’. Standard-major-axis 
regression was chosen for its appropriateness in cases involving uncertainty in the 
independent variable (here, skull length) as well as the dependent variable. 
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Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Computed tomography data are archived at The University of 
Texas High-Resolution X-ray Computed Tomography Facility (UTCT) and are 
available from the corresponding author upon reasonable request. 


40. R Core Team. R: A Language and Environment for Statistical Computing https:// 
www.R-project.org/ (R Foundation for Statistical Computing, Vienna, 2017). 

41. Warton, D. |., Duursma, R. A., Falster, D. S. & Taskinen, S. smatr 3-an R package 
for estimation and inference about allometric lines. Methods Ecol. Evol. 3, 
257-259 (2012). 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


SOMATOM 
»_ Sensation 


Extended Data Fig. 1 | Preparation and scanning of specimen 

(TMM 43690-5). a, Field jacket next to quarry, set on 8-foot-long, 2-by- 
4-inch beams; the red object on top is a Swiss army knife. b, Scanning the 
opened jacket at the Austin Heart Hospital. S. Egberts (right) discovered 
the perinates. c, d, Photomicrograph of a right mandible (TMM 43690- 
5.135d) (c) exposed on the surface of a ‘chunk of matrix removed from the 
opened jacket (d). At this stage of preparation, the jacket still contained 
maternal bones (black outline) as well as some perinatal remains. 

e, f, Chunk of matrix (TMM 43690-5.135) removed from jacket for high- 
resolution computed tomography scanning. g, Volumetric rendering of 

a sub-volume scan of the chunk showing a maternal thoracic vertebra 


surrounded by perinatal bones. h, Small chunk (TMM 43690-5.013) with 
flecks of perinatal bone exposed on the surface. i, j, Digital radiograph 

(i) of multiple flakes mounted in lucite tube for reconnaissance scan, and 
computed tomography slice (j), at level indicated by red line in i, showing 
perinatal remains (red circles). k, 1, Computed tomography scanning (k) 
of individual chunk (TMM 43690-5.013), and high-resolution computed 
tomography slice (1) showing perinatal bones and components of the 
sediment. b, perinatal bones; c, carpals; cl, clay clast; M1, Mo, lower 
molariforms 1, 2; n, carbonate nodule; ph, phalanges; r, ribs; s, sand 
matrix; sc, scapula; u, ulna. 
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Extended Data Fig. 2 | Selected perinatal remains. a, Perinatal bones. 

b, Original positions of bones in a with respect to adult elements. Yellow 
stars indicate perinatal individuals that preserve paired dentaries and that 
were counted towards our census; green stars indicate parts of perinatal 
individuals that were not counted towards our census (or counted as one- 
half; Extended Data Table 1a, Supplementary Table 1). Because stars are 
larger than perinatal bones, the positions of the stars are approximate. 

c, Chunk of original matrix showing adult thoracic vertebra and selected 
perinatal remains in situ. Matrix is rendered transparent. The vertebra in 


lower. 
molariform 


limb element 


20 mm 


c corresponds to that shown in b. Specimen numbers 2, 3, 5, 6, 8, 10, 14 
and 15 correspond to TMM 43690-5.135a-TMM 43690-5.135h; specimen 
numbers 4, 7, 9, 12 and 13 correspond to TMM 43690-5.136a-TMM 
43690-5.136e; specimen number 11 corresponds to TMM 43690-5.137a; 
and specimen number 1 corresponds to TMM 43690-5.139a. c, carpus; lj, 
lower incisor; M;, M>, lower molariforms 1, 2; M!, first upper molariform; 
mc, metacarpal; ph, phalanges; r, rib; u, ulna; v, vertebra. 
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Extended Data Fig. 3 | Skull ontogeny in Kayentatherium. The log-log 

plot shows various skull measurements versus maximum skull length at up 
to three ontogenetic stages (perinate, small adult and large adult). Because 
sample sizes are low, lines connecting the data points are shown in place of 


Legend 

- maximum skull height 

-e face length 

-e temporal opening length 

-e width of transverse process of pterygoid 
-e- length of transverse process of pterygoid 
-e zygomatic arch height 

~»- zygomatic root length 

~e height of tooth-bearing ramus of dentary 
-e width of coronoid process of dentary 


regression lines. The reference line has a slope or coefficient of allometry, 
b, equal to 1. In analyses of allometry, b < 1 indicates negative allometry 
and b > 1 indicates positive allometry. Raw data are in Extended Data 
Table 1c. Skulls shown are TMM 43690-5.035a, MCZ 8811 and MCZ 8812. 
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Extended Data Fig. 4 | Mandibular and dental ontogeny in coronoid process at its base and stars indicate the anterior limit of the 
Kayentatherium. a-c, Ontogenetic series of right Kayentatherium masseteric fossa. Jaws on the left are shown relative to the large adult, 
dentaries in lateral view (a, perinate, TMM 43690-5.035b; b, small adult, MCZ 8812. 


MCZ 8811; ¢, large adult, MCZ 8812). Curves follow the angle of the 
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Extended Data Fig. 5 | Dental and mandibular anatomy of dentary in medial view. I), lower incisor; I;r, replacement lower incisor; 
Kayentatherium perinate (TMM 43690-5.035b). a, Right dentary in M,-M4g, lower molariforms 1-4. 
lateral view. b, Teeth in dorsal view in situ without dentary. c, Right 
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Extended Data Fig. 6 | Postdentary elements of Kayentatherium process and postdentary elements at position indicated in a. c, Partial right 
perinate (TMM 43690-5.032a). a, Partial left mandible in medial view, mandible in medial view. M;—M3, lower molariforms 1-3. 
with postdentary elements coloured red. b, Section through coronoid 
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Extended Data Fig. 7 | Additional views of hand and forelimb of Kayentatherium perinate (TMM 43690-5.032a). a, Partial left hand in situ with 
paired jaws, carpals, humerus and additional fragmentary elements. b, Partial left hand in palmar view. I-V, digits I-V. 
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Extended Data Fig. 8 | Additional views of humerus and femur of Kayentatherium perinates. a-~d, Humerus (TMM 43690-5.032a) in frontal (a, b) 
and side (c, d) views. e-h, Femur (TMM 43690-5.013a) in frontal (e, f) and side (g, h) views. 
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Extended Data Table 1 | Sample of adult and perinatal elements, and skull measurements at three ontogenetic stages 
a 


Preserved part of skeleton (perinates) Count Minimum individuals 
Pair of dentaries* in skull 10 10 
Pair of dentaries* not in skull 10 10 
Unpaired dentary* identified to side 4 at 
Unpaired dentary* not identified to side 32 16t 
Total estimate 38 
b 
Preserved part of skeleton (adult Count Surface float or excavated block 
Maxillary fragments, containing a total of 6 molariforms - float 
Isolated upper molariforms 4 float 
Dentary, left, with 4 erupted molariforms and 1 unerupted molariform 1 float 
Dentary, right, with incisor and 4 erupted molariforms 1 float 
Humerus, right (broken) 1 float 
Proximal ulna, right 1 float 
Isolated metapodials (also see below) 2 float 
Caudal vertebrae 3 float 
Scapula, left 1 block 
Ulna, left 1 block 
Articulated carpals, left 7 block 
Articulated digit | (1 metacarpal and 2 phalanges) 1 block 
Isolated metacarpal 1 block 
Isolated phalanges 2 block 
Ribs and rib fragments 14 block 
Thoracic vertebra 1 block 
c 
Preserved part of skeleton (perinates) Count 
Humeri 2 
Articulated partial hand and wrist 1 
carpals 6 
metacarpals 5 
phalanges T 
Femur 1 
Isolated metapodial 1 
Isolated phalanges 3 
Unidentified limb elements 6 
d 
Skull measurement Perinate* (mm) _ Perinatal specimen _ Small adult8(mm) __Large adult! (mm) n R? b 
Maximum skull length (premaxilla to parietal) 14.0 TMM 43690-5.035a 138 (est.) 278 3 - - 
Maximum skull height, including dentary 6.9 TMM 43690-5.035a - 162 2 - - 
Face length (premaxilla to zygomatic root) 5.2 TMM 43690-5.035a 52 105 3 1.00 1.00 
Temporal opening length 6.8 TMM 43690-5.035a - 125 2 - - 
Width of transverse process of pterygoid 0.6 TMM 43690-5.045a 41 (distorted) 2 - - 
Length of transverse process of pterygoid 0.8 TMM 43690-5.045a 19 35 3 0.99 1.30 
Zygomatic arch height 0.9 TMM 43690-5.035a - 67 2 - - 
Zygomatic root length 1.8 TMM 43690-5.045a 22 37 2 1.00 1.03 
Height of tooth-bearing ramus of dentary 2.0 TMM 43690-5.035b 23 44 3 1.00 1.04 
Width (at base) of coronoid process of dentary 3.3 TMM 43690-5.035b 34 67 3 1.00 1.01 


a, Summary of perinatal census. b, Inventory of adult elements. c, Inventory of identified perinatal postcrania. d, Skull measurements at three ontogenetic stages, with summary of regressions on skull 
length. 

*|ntact dentary or partial dentary including at least two (of three) erupted molariforms. 

TAs a conservative estimate, each unpaired dentary =0.5 individuals. 

+All measurements were performed by placing indicators on 3D isosurfaces in VGStudio Max 2.1. Standard-major-axis regressions on skull length were performed in R*° using the package smatr*! for 
measurements available for n=3 ontogenetic stages. R? is the square of the correlation coefficient. b is the coefficient of allometry, in which b= 1 indicates isometry, b > 1 indicates positive allometry 
and b < 1 indicates negative allometry. 

SMeasurements from MCZ 8811. 

|IMeasurements from MCZ 8812. 
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Sulfoxaflor exposure reduces bumblebee 


reproductive success 


Harry Siviter’*, Mark J. F. Brown! & Ellouise Leadbeater! 


Intensive agriculture currently relies on pesticides to maximize 
crop yield’. Neonicotinoids are the most widely used insecticides 
globally’, but increasing evidence of negative impacts on important 
pollinators*° and other non-target organisms” has led to legislative 
reassessment and created demand for the development of alternative 
products. Sulfoximine-based insecticides are the most likely 
successor!!, and are either licensed for use or under consideration 
for licensing in several worldwide markets’, including within the 
European Union”, where certain neonicotinoids (imidacloprid, 
clothianidin and thiamethoxam) are now banned from agricultural 
use outside of permanent greenhouse structures. There is an urgent 
need to pre-emptively evaluate the potential sub-lethal effects of 
sulfoximine-based pesticides on pollinators!!, because such effects 
are rarely detected by standard ecotoxicological assessments, but 
can have major impacts at larger ecological scales'*-'5. Here we 
show that chronic exposure to the sulfoximine-based insecticide 
sulfoxaflor, at dosages consistent with potential post-spray field 
exposure, has severe sub-lethal effects on bumblebee (Bombus 
terrestris) colonies. Field-based colonies that were exposed to 
sulfoxaflor during the early growth phase produced significantly 
fewer workers than unexposed controls, and ultimately produced 
fewer reproductive offspring. Differences between the life-history 
trajectories of treated and control colonies first became apparent 
when individuals exposed as larvae began to emerge, suggesting that 
direct or indirect effects on a small cohort may have cumulative long- 
term consequences for colony fitness. Our results caution against 
the use of sulfoximines as a direct replacement for neonicotinoids. 
To avoid continuing cycles of novel pesticide release and removal, 
with concomitant impacts on the environment, a broad evidence 
base needs to be assessed prior to the development of policy and 
regulation. 

The widespread global use of highly effective neonicotinoid-based 
pesticides has led to the evolution of resistance among several insect 
crop pests!® and has generated worldwide interest in emerging 
sulfoximine-based alternatives that have been shown to be effective in 
targeting some neonicotinoid-resistant species'”~!°. This potential lack 
of cross-resistance may reflect differences in the three-dimensional 
molecular structure that preclude the breakdown of sulfoximines by 
enzymes that are involved in neonicotinoid metabolism””, supporting 
the claim that sulfoximines and neonicotinoids are chemically 
distinct!”. However, as selective agonists of insect nicotinic acetylcholine 
receptors!’, the two pesticide groups share a common biological mode of 
action. This raises major concerns about potential effects on non-target 
species, and particularly on bees. Neonicotinoids, while not lethal to 
bees at field-realistic levels, have severe sub-lethal effects on both social 
and solitary bees, influencing cognition, foraging ability, homing ability, 
reproductive output, colony initiation®”*1>7!-?5, and, potentially, 
pollination services”°. Mathematical modelling has shown that these 
sub-lethal stressors can have considerable negative consequences for 
colony fitness downstream in the colony cycle'*!’. 

To assess whether sulfoxaflor, the first marketed sulfoximine- 
based pesticide, has similar negative effects on bees, we fed either 


untreated sucrose solution (1.8 M) ora sucrose solution containing 
5 wg dm~3 (5 ppb) of sulfoxaflor to nascent Bombus terrestris colonies 
reared from wild-caught queens. We based this concentration on 
available estimates for sulfoxaflor residues in forager-collected nectar 
post-spray”’ (Extended Data Fig. 1a), because spray application 
is currently the most common application procedure (although 
products containing sulfoxaflor have also been developed for seed 
treatments and are already available for use on bee-pollinated 
crops in some markets*®). After two weeks of laboratory-based 
exposure, size-matched colonies were placed in the field around 
a university parkland campus following a paired design and were 
no longer provided with additional resources. Staggered weekly 
nocturnal censuses revealed a clear difference in colony demographics 
between control and experimental colonies. The bumblebee colony 
cycle is characterized by an early growth phase in which worker 
numbers increase rapidly to create a large workforce, followed by 
a switch to production of reproductive brood later in the season. 
Between two and three weeks after exposure, detectable differences 
in worker numbers between treated and control colonies began to 
emerge, persisting until close to the end of the colony cycle (Fig. la 
and Supplementary Table 2d; analysis using a generalized linear 


mixed-effects model: treatment parameter estimate = —0.28, 95% 
confidence interval = —0.48 to —0.01; treatment:week interaction 
parameter estimate = —0.06, 95% confidence interval = —0.11 to 


—0.01; treatment:week? interaction parameter estimate = 0.11, 95% 
confidence interval = 0.05 to 0.16). 

As the colony cycle progressed, negative impacts on the reproductive 
output of the treated colonies became apparent. Treated and control 
colonies were equally likely to produce male reproductive offspring, 
but treated colonies produced significantly fewer males in total 
(zero-inflated count model, binomial section, treatment parameter 
estimate = 0.71, 95% confidence interval = —0.67 to 2.09; count 
section, treatment parameter estimate = —0.54, 95% confidence 
interval = —0.72 to —0.37; Fig. 2). This difference became apparent 
from approximately week 9 onwards (Fig. 1b). The dry mass of these 
males was no different from those produced by control colonies 
(w; (null model) = 0.974), indicating that our results cannot be 
explained by differential investment in reproductive biomass. Neither 
treated nor control colonies produced an abundance of queens, but 
control colonies produced more gynes than treated colonies (in total, 
36 new gynes from 3 out of 26 control colonies, no new gynes were pro- 
duced by any of the 25 treated colonies); thus our findings hold when 
the total number of sexual offspring is analysed (zero-inflated count 
model, binomial section, treatment parameter estimate =0.71, 95% 
confidence interval = —0.67 to 2.09; count section, treatment parameter 
estimate = —0.64, 95% confidence interval = —0.81 to —0.46). The 
timing of reproductive onset, queen longevity and colony survival 
did not differ between control and treated colonies (Extended Data 
Fig. 2; survival analyses, treatment parameter estimate for reproductive 
onset = —0.05, 95% confidence interval = —0.41 to 0.31; colony 
longevity = —0.03, 95% confidence interval = —0.43 to 0.38; queen 
survival = —0.07, 95% confidence interval = —0.47 to 0.33). 


1§chool of Biological Sciences, Royal Holloway University of London, Egham, UK. *e-mail: Harry.Siviter.2016@live.rhul.ac.uk 


6 SEPTEMBER 2018 | VOL 561 | NATURE | 109 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


a 
40+ 
g i Ee od. oh ui e Control 
Oo 204 : Io | T e Treatment 
g Ee ~ a 4 tT F « 
: ee 
3 10+ x t T i | 
04 ‘ 
T T T rT T T T T T T T F F r T 7 
b 0 5 10 15 
104 : 
me} : = 
0) : 
8 : 
3 «685 : 
° : 
a. ; © Control 
8 64 : TF 
g : ~ | e Treatment 
6 44 . + 7 4 
o : t + 
e 2 : I 2 + 
qt dttisl 
04 er es ae ee: ee ae oe 
T LU L LU T is T T T T Ik T F T T T 
a 0 5 10 15 
§ 1.07 
re) 
a 
£ 08-7 
= 
Q as + e Control 
© 0.67 T 
= ' a | | e Treatment 
8 =e i. 34 
5 044 744 i «we d | 
c a a | 
2 =, t if 
€ 02+ uf. Lt + ] 
Q 
° pe 
a o4 ~ Lb 2 «2 
T T T iF T T T T T T ir T T T T i 
0 5 10 15 
Week of experiment 
d Laboratory-based exposure 
Moved to field Maximum colony life span 
Emergence of adults that had maximum exposure 
v v 


0 2 4 6 8 


Fig. 1 | The impact of sulfoxaflor exposure on life-history trajectories of 
bumblebee colonies. a~c, Week-by-week colony field census data. 

a, Number of workers from treated (n = 26) and control colonies (n = 26). 
b, Number of sexual offspring. c, Proportion of workers returning to 

the colony with pollen for treated and control colonies (n= 25 and 26 
respectively; reduced sample size for treated colonies reflects the death of 


On the basis of the neonicotinoid literature, we considered whether 
this difference in the production of sexual offspring was mediated 
through poor provisioning of larvae by foraging workers®”", at the time 
when sexual offspring were developing. However, daytime foraging 
censuses revealed no significant differences in the relative number 
of bees returning to control and treated colonies (generalized linear 
mixed model, treatment parameter estimate = —0.07, 95% confidence 
interval = —0.32 to 0.19). Similarly, although visual inspection of 
the data suggested that a lower proportion of workers returned with 
pollen to pesticide-treated compared to control colonies from week 
eight onwards (Fig. 1c), this effect did not receive statistical support 
(generalized linear mixed model, week:treatment interaction parameter 
estimate = —0.14, 95% confidence interval = —0.29 to 0.001; treatment 
parameter estimate = 0.46, 95% confidence interval = —0.38 to 1.31) 
and furthermore occurred too late in the colony cycle to explain the 
differences in production of male offspring, which became apparent at 
approximately the same time. We also found no significant differences 
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10 12 14 


one queen in week 2, see Methods). Data are mean + s.e.m. 

d, Demographic timeline indicates the time points at which the 
laboratory-based exposure started (the exposure period is indicated in 
red); the colonies were moved into the field; adults that encountered 
maximum exposure as larvae should begin to emerge”’ and the maximum 
lifespan of the colony. 


in the size of pollen loads collected between control and pesticide- 
treated colonies (Extended Data Fig. 3). Instead, consideration 
of the timing of differences between control and treated colonies 
suggests that the effects of sulfoxaflor exposure on reproductive output 
were mediated by the early drop in worker numbers that began at 2-3 
weeks after exposure. Bumblebee worker pupae take approximately 
14 days to develop”, so the onset of deceleration of the growth of the 
colony workforce corresponds to the eclosion of individuals that had 
encountered maximum exposure as larvae (Fig. 1d). It remains unclear 
whether this failure to eclose was driven by direct effects on exposed 
larvae®”, or indirect effects, perhaps mediated by poor provisioning”! 
by exposed workers (although note that colonies were provided with 
pollen and sucrose in the laboratory during this time). In either case, 
the resultant drop in worker numbers led to differences in the life- 
history trajectories of control and sulfoxaflor-treated colonies, with 
consequent effects on the reproductive output of treated colonies". 
These knock-on effects of early exposure to a small cohort of colony 
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Fig. 2 | Male offspring production. The number of male sexual offspring 
produced in sulfoxaflor-treated (m= 25) and control (n = 26) colonies. 
Data are mean + s.e.m. 


members are entirely consistent with the results of mathematical explo- 
rations of stress impacts on bee colonies, which predict that chronic 
stress at an early stage can push bee colonies beyond a ‘tipping point, 
increasing the likelihood of colony failure’. 

Sulfoxaflor is a systemic pesticide that is soluble in water and is thus 
transported around plant tissues following foliar or seed application. 
The likely exposure trajectory of pesticide treatments on crops differs 
between seed treatments, which deliver prolonged exposure, and spray 
applications, which deliver a short-term dose that is initially high but 
typically declines rapidly. Sulfoxaflor, like neonicotinoid-based pesti- 
cides, can be administered using both methods, and sulfoxaflor-based 
products that are used as a seed treatment have recently been developed 
for crops that attract bees (including oilseed crops)*'. However, most 
currently marketed preparations are spray applications. The dosage 
used in this study is below US Environmental Protection Agency 
estimates for field-realistic immediate post-spray concentrations of 
sulfoxaflor in forager-collected nectar, and remains below residual con- 
centrations estimated at 10 days after spray application (the maximum 
period for which data are available; concentration range over the whole 
period: 5.41-46.97,1g active ingredient (a.i.) per kg, application rate: 
0.045 pounds (0.020 kg) of active ingredient per acre applied twice”’; 
Extended Data Fig. 1a, b). Note that our treatment protocol is particu- 
larly conservative in that our nascent colonies were fed untreated pollen 
in addition to the syrup provided, potentially producing underesti- 
mates of the effects on larvae. Post-spray sulfoxaflor residues in pollen 
have been documented to be more than tenfold higher than those in 
forager-collected nectar (Extended Data Fig. la, b), ranging from 
510.95 to 50.12 \g a.i. per kg over the same post-spray period’. 
Mitigation measures can be used to reduce bee exposure to sulfoxa- 
flor when used as spray treatments (for example, spray application to 
crops that attract bees during bloom is prohibited by law in the United 
States)**. Globally, however, under current usage, such measures are 
often either absent*? or limited to product label recommendations to 
avoid spraying six days before bloom™*. No such measures are possible 
for those products that have been developed as a seed treatment®!. 

The impact of sulfoxaflor identified here can be compared with previous 
experiments that focused on exposure to neonicotinoids. For example, 
bumblebee colonies placed next to oilseed rape fields that were treated 
with neonicotinoids showed a 71% reduction in the mean number of 
queen cocoons found within the nest® and a 32-36% reduction in the 
mean number of males and/or workers produced’. Similarly, colonies 
foraging next to thiacloprid-treated raspberry crops had a 46% reduction 
in reproductive output®* and commercial bumblebee colonies exposed 
to imidacloprid for a period of two weeks had an 85% reduction in the 
number of new queens produced’. Here, we found that sulfoxaflor- 
exposed colonies had a 54% reduction in the total number of 
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sexual offspring produced compared with control colonies, suggesting 
that from the perspective of wild pollinators, sulfoxaflor exposure could 
lead to similar environmental impacts as neonicotinoids if used on 
crops that attract bees in the absence of evidence-based legislation. 
Sulfoximine-based pesticides are a newly emerging class of product, 
but are already licensed in many countries worldwide, including China’, 
Canada”* and Australia®®. Within the European Union, where the use of 
certain neonicotinoids is now banned for open-field crops, substances 
containing sulfoxaflor as an active ingredient have been assessed by the 
European Food Safety Authority*’ and approval has been granted for 
use in five member states, and applications from seven more member 
states are currently in progress*®, Our results provide pre-emptive 
evidence that, if exposure at equivalent dosages to those used in our 
study occurs via bee-attractive crops before or during bloom, either 
through spray or seed treatment applications, these products could 
pose a substantial risk to pollinators. The effects that we identified were 
the longer-term outcome of initial short-term exposure, and were only 
detected by monitoring the full colony cycle. Bans and restrictions on 
neonicotinoid-based pesticides have largely been implemented to 
protect important pollinators such as bees, following years of wide- 
spread use with potential long-term population-level consequences. 
To avoid a situation in which pesticides such as neonicotinoids are 
replaced by products that are similarly contentious, regulatory bodies 
should move towards an evidence-based approach that assesses both 
the lethal and sub-lethal consequences of novel insecticides such as 
sulfoxaflor on non-target organisms, and incentivises integrated 


pest-management approaches before products are licensed for use*”. 
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METHODS 


Exposure regime. Sulfoxaflor-based preparations have been developed for use on 
a wide range of bee-attractive crops that flower at varying times of the year. The 
regime used in our study most closely mimics spring-flowering crops in temperate 
environments, allowing comparison with similar neonicotinoid-based studies®”!° 
that also exposed colonies for a short period during the early growth phase. 

Preparations containing sulfoxaflor as an active ingredient are currently most 
commonly applied as a foliar spray. We thus based our pesticide concentrations on 
the best available information from a realistic and bee-relevant spray experiment 
reported by the US Environmental Protection Agency (EPA), in which sulfoxaflor 
was applied to a cotton crop at an application rate of 2 x 0.045 pounds of active 
ingredient per acre. Under this application regime, mean sulfoxaflor residue levels 
in honeybee-collected nectar did not drop below 5 1g a.i. per kg over an 11-day 
period”’ (the maximum period for which data are available; Extended Data Fig. 1a). 
We are confident that our exposure is conservative, because (a) in the same exper- 
iment, pollen residue levels did not drop below 50 1g a.i. per kg*?” (Extended Data 
Fig. 1b), while we provided all colonies with untreated pollen ad libitum; and (b) this 
application rate is similar to label recommendations for at least some sulfoxaflor- 
based products*’. A second study has also measured residues (in cucumber), but 
application rates were 1.5 times above recommended usage, and the relevance of 
this experiment for bees is unclear as the cucumber tissue that was sprayed and 
sampled was not described“. 

In terms of current usage, our data are most relevant to sulfoxaflor preparations 

when sprayed on crops immediately before or during bloom (note that this practice 
has recently been reviewed and prohibited in the United States””). Although some 
product labels recommend avoidance of spraying six days before bloom™, this 
ignores experimental data showing that residues could remain present in pollen at 
levels that we show to have sub-lethal impacts after this six-day period”’ (Extended 
Data Fig. 1d). Other labels allow spraying during bloom at night*’. To the best of 
our knowledge, no data are currently available on field-realistic residues for seed 
treatment preparations that have been developed for use on oilseed crops and are 
already available in some markets”*. 
Queen rearing. In total, 332 bumblebee (Bombus terrestris audax) queens were 
caught between the 28 February and the 23 March 2017 in Windsor Great Park, 
Surrey, UK. Chilled queens were transported to the laboratory, where their faeces 
were microscopically examined for parasites (Nosema spp., Apicystis bombi, 
Sphaerularia bombi and Crithidia bombi; 400 x magnification). Parasitized individuals 
(n=54) were removed from the experiment. A second parasite screening was 
repeated after one week (29 further queens were removed, n= 249 queens 
remained). 

Queens were placed in rearing boxes (67 mm (width) by 127 mm (length) by 
50 mm (depth); Allied Plastics) and were provided with a gravity feeder containing 
an ad libitum supply of 1.8 M sucrose solution (changed weekly; Thorne) and a 
pollen ball (changed twice weekly, unless the queen was laying eggs in which case 
more pollen was added; Biobest). Each queen was housed in a dark/red-lit room 
maintained at 26°C and 50-60% relative humidity. Queens that did not produce 
eggs after eight weeks were removed from the experiment (n = 107). Once a queen 
had produced at least six workers, the colony was moved into a wooden nest box 
(280 mm (width) by 320 mm (length) by 160 mm (depth)) and randomly assigned 
to a treatment group (see ‘Pesticide exposure’). The time taken to reach this stage 
varied but was on average 7.2 weeks (+s.d. of 1.5 weeks). On transfer, the queens 
underwent a final parasite screening (2 queens removed). Two queens died before 
transfer, therefore, 52 colonies reached this stage. The use of colonies from wild- 
caught queens is a key feature of our experimental design that enabled us to 
(a) have a complete overview of the lifecycle of these colonies (both in the laboratory 
and the field, see below), and (b) use colonies with a life history that was adapted 
to the local environment. 

Pesticide exposure. Prior to pesticide exposure, colonies were allocated randomly 
to control and treatment groups and paired for size according to the number of 
workers present (mean + s.d.= 8.43 + 1.87). Each colony was then provided with 
an ad libitum supply of either 1.8 M sucrose solution containing 5 jg dm“? (5 ppb) 
sulfoxaflor (derived from a stock solution of 1 g dm~? in acetone; Greyhound 
Chromatography and Allied Chemicals) or 1.8 M sucrose containing an equivalent 
concentration of acetone but no sulfoxaflor for a two-week period. Sucrose solution 
was weighed on placement in and removal from the colony; no differences in 
consumption were found between treatment groups (w; (null model) = 0.985). 
During the exposure period, we recorded the number of workers produced, colony 
mass and the number of dead workers on a weekly basis. One queen died during 
the exposure period, thus 51 colonies were present at the start of the field experi- 
ment (n= 26 control colonies and m= 25 pesticide-treated colonies). 

Field placement. After two weeks of exposure in the laboratory, colonies were 
moved into the field. Nest boxes were placed within plastic field boxes (440 mm 
(width) by 710 mm (length) by 310 mm (depth); Really Useful Box) containing 
insulation wrap (Thermawrap) and aluminium foil, and placed at locations around 
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the Royal Holloway University of London campus, Egham, UK (45 ha; Extended 
Data Fig. 4). Paired colonies were matched for location within the campus, and 
were positioned at least 20 m from one another to reduce drifting. Each colony 
entrance was demarcated by a distinctive visual pattern. Colonies were placed in 
discreet, shaded and southeast-facing locations, and secured with a ratchet strap 
to avoid badger damage. To prevent usurpation attempts from other queens and 
social parasite species (Bombus vestalis), queen excluders were placed on each 
colony. Upon initial placement in the field, the colonies were supplied with a gravity 
feeder containing 46 g 1.8 M sucrose solution, after which they received no further 
food supplements. The process of field placement was staggered over six weeks 
(10 April to 21 May 2017) owing to variation in the date at which queens were initially 
caught. The week of placement was included as a predictor in each statistical 
analysis (see ‘Statistical analysis’). 

Data collection. We combined methodological approaches from previous studies 
on the effects of neonicotinoids on bumblebees*”!, as well as studies on bumblebee 
life history*! to maximize our measurement of both impacts and potential 
mechanisms. We conducted censuses every night such that each colony was 
visited once per week, between the hours of 21:30 and 04:00. Using a red-light 
torch, we recorded the number of live workers (average of three counts), dead 
workers, males and new queens. We also recorded the state of the original queen 
(dead or alive), the presence of gyne larvae and/or pupae, the presence of worker 
larvae and/or pupae, the number of pollen and nectar pots containing stores, 
and the mass of the colony (average of three recordings; EM-30KAM balance, 
A&D Instruments). In cases in which the wax covering prevented observation, 
we peeled it back in order to conduct the count. Weekly censuses continued until 
moribundity, defined as either a live queen and three or fewer workers, or no 
queen and 10 workers or fewer. After the experiment, all sexual offspring that 
had been found in the colonies (n = 600) were dried for 72 h and weighed (accu- 
racy of +0.001 g). 

All 51 colonies were also visited during daylight hours twice per week. Colony 
traffic (number of bees entering and leaving the nest) was recorded during 10-min 
counts, once between 9:00 and 13:00 and once between 14:00 and 18:00. We also 
recorded whether returning workers had large (pollen basket was over-flowing) 
or small (pollen enclosed within pollen basket) pollen loads relative to their body 
size. Control and pesticide pairs were always observed directly after one another, 
in a random order. The average daily temperature, humidity and total rainfall were 
obtained from a local weather station (https://wunderground.com). 

Statistical analyses. We used an information theoretical model selection approach. 
For each response variable, the initial candidate set included a full model and all 
subsets, including a null model. Reported parameter estimates and confidence 
intervals are based on full-set averaging of the 95% confidence set (that is, the set 
of models with cumulative Akaike weight >0.95). Model types, error structuring, 
a list of parameters included within each model and parameter estimates are 
provided in Supplementary Tables 1, 2. In brief, to analyse the number of workers 
produced per week, we used a generalized linear model (glmer; Poisson error 
structure) with colony nested within the pair as a random factor, and the week of 
initial field placement (week started), treatment, week of experiment and a two-way 
interaction between treatment and week of experiment as fixed factors. Because the 
number of workers increased to a maximum and then decreased for each colony, 
‘week of experiment’ was modelled as a quadratic factor (AAIC between full linear 
and full quadratic model: 1206.40). Many colonies did not produce sexual off- 
spring, so we used zero-inflated generalized linear models (zeroinfl) to analyse 
the differences in both the overall number of sexual offspring and the number of 
males produced by colonies, with the week of initial field placement, treatment 
and their interaction as predictors. The number of workers returning to the nest 
was analysed using a zero-inflated generalized linear model (glmmadmb; nega- 
tive binomial error structure) in which treatment, week started, colony week and 
temperature were included as fixed factors and colony as a random factor. The 
proportion of workers returning with pollen was also analysed using a generalized 
linear model (glmmadmb; binomial error structure) with treatment, colony week 
and their interaction, week started, temperature and time of day included as fixed 
factors and colony/pair included as a random factor. Week of reproductive onset 
and queen survival were analysed using a Cox proportional hazards survival 
analysis that contained treatment and week started as fixed factors. All analyses 
were conducted in R studio (version 1.0.136) using the R packages pscl*?, Ime4“4, 
glmm*, MuMin® survival*’ and glmmadmb*. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. The full dataset is available as an open science framework project 
(https://osf.io/acrsy/). 
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Extended Data Fig. 1 | Concentrations of sulfoxaflor in forager- application. Dosage: twice over ten days at 0.045 pounds a.i. per acre (a, b); 
collected resources from a USA EPA cotton study. Mean 1g of active once over ten days at 0.045 pounds a.i. per acre (c, d); twice over ten 
ingredient (a.i.) per kg (mean +s.e.m.) found in the nectar (a, c, e) and days at 0.089 pounds a.i. per acre (e, f). The black dotted horizontal line 
pollen (b, d, f) of honeybees foraging on cotton crops sprayed with indicates the equivalent amount of sulfoxaflor (5 ppb) that was fed to 
sulfoxaflor. Note the differences in y-axis scale between graphs, owing sulfoxaflor-treated colonies in sucrose in our experiment. Data are means 
to considerably higher concentrations in pollen. Red lines indicate spray from two hives; number of individual bees sampled is not published”’. 
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Extended Data Fig. 2 | Timing of colony life-history events. a~c, The probability of reproductive onset (a), queen survival (b) and colony survival (c) 
for control (n = 26) and sulfoxaflor-treated (n = 25) colonies (+ confidence intervals). 
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Extended Data Fig. 3 | Pollen foraging. The proportion (mean + s.e.m.) of foragers returning to the nest with large pollen loads, for control (n= 25) and 
pesticide-treated (n = 22) colonies (note that not all of the colonies in the experiment had pollen foragers). 
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Extended Data Fig. 4 | Distribution of colonies across the Royal Holloway Campus. Blue dots indicate control colonies; red dots indicate treated 
colonies. Grid reference: TQ000706; Imagery © Google, Map Data © 2018 Google. 
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The genome of the offspring of a Neanderthal 
mother and a Denisovan father 


Viviane Slon!’*, Fabrizio Mafessoni!’, Benjamin Vernot’, Cesare de Filippo’, Steffi Grote!, Bence Viola’, Mateja Hajdinjak!, 
Stéphane Peyrégne', Sarah Nagel!, Samantha Brown’, Katerina Douka*, Tom Higham, Maxim B. Kozlikin’, 
Michael V. Shunkov*", Anatoly P. Derevianko*, Janet Kelso!, Matthias Meyer!, Kay Priifer! & Svante Padbo!* 


Neanderthals and Denisovans are extinct groups of hominins 
that separated from each other more than 390,000 years ago}. 
Here we present the genome of ‘Denisova 11, a bone fragment 
from Denisova Cave (Russia)? and show that it comes from an 
individual who had a Neanderthal mother and a Denisovan father. 
The father, whose genome bears traces of Neanderthal ancestry, 
came from a population related to a later Denisovan found in the 
cave*®, The mother came from a population more closely related 
to Neanderthals who lived later in Europe’ than to an earlier 
Neanderthal found in Denisova Cave’, suggesting that migrations 
of Neanderthals between eastern and western Eurasia occurred 
sometime after 120,000 years ago. The finding ofa first-generation 
Neanderthal-Denisovan offspring among the small number of 
archaic specimens sequenced to date suggests that mixing between 
Late Pleistocene hominin groups was common when they met. 

Neanderthals and Denisovans inhabited Eurasia until they were 
replaced by modern humans around 40,000 years ago (40 ka)’. 
Neanderthal remains have been found in western Eurasia!®, whereas 
physical remains of Denisovans have thus far been found only in 
Denisova Cave*®!!)!2, where Neanderthal remains have also been 
recovered’, Although little is known about the morphology of 
Denisovans, their molars lack the derived traits that are typical of 
Neanderthals*". 

DNA recovered from individuals of both groups suggests that they 
diverged from each other more than 390 ka. The presence of small 
amounts of Neanderthal DNA in the genome of ‘Denisova 3°, the 
first Denisovan individual to be identified*, indicates that the two 
groups mixed with each other at least once’. It has also been shown 
that Neanderthals mixed with the ancestors of present-day non- 
Africans around 60 ka”*!3, and possibly with earlier ancestors of modern 
humans! !*!; and that Denisovans mixed with the ancestors of present- 
day Oceanians and Asians*!®'!7, Denisovans may furthermore have 
received gene flow from an archaic hominin that diverged more than a 
million years ago from the ancestors of modern humans®. 

A fragment of a long bone, “Denisova 11’ (Fig. 1), was identi- 
fied among over 2,000 undiagnostic bone fragments excavated in 
Denisova Cave as being of hominin origin using collagen peptide 
mass fingerprinting®. Its mitochondrial (mt)DNA was found to be of 
the Neanderthal type and direct radiocarbon dating showed it to be 
more than 50,000 years old?. From its cortical thickness, we infer that 
Denisova 11 was at least 13 years old at death (Extended Data Fig. 1 and 
Supplementary Information 1). We performed six DNA extractions!®! 
from bone powder collected from the specimen, produced ten DNA 
libraries” from the extracts (Extended Data Table 1 and Supplementary 
Information 2, 3) and sequenced the Denisova 11 genome to an average 
coverage of 2.6-fold. The coverage of the X chromosome was similar 
to that of the autosomes, indicating that Denisova 11 was a female. 
Using three different methods, we estimate that contaminating 


present-day human DNA fragments constitute at most 1.7% of the data 
(Supplementary Information 2). 

To determine from which hominin group Denisova 11 originated, 
we compared the proportions of DNA fragments that match derived 
alleles from a Neanderthal genome (‘Altai Neanderthal, also known as 
‘Denisova 5’) or a Denisovan genome (Denisova 3), both determined 
from bones discovered in Denisova Cave®*, as well as from a present- 
day African genome (Mbuti)° (Supplementary Information 4). At 
informative sites!, 38.6% of fragments from Denisova 11 carried alleles 
matching the Neanderthal genome and 42.3% carried alleles matching 
the Denisovan genome (Fig. 2a), suggesting that both archaic groups 
contributed to the ancestry of Denisova 11 to approximately equal 
extents (Supplementary Information 4). Approximately equal propor- 
tions of Neanderthal-like and Denisovan-like alleles are found in each 
of the ten DNA libraries originating from Denisova 11 but not in librar- 
ies from other projects that were prepared, sequenced and processed in 
parallel, which excludes an accidental mixing of DNA in the laboratory 
or a systematic error in data processing (Supplementary Information 3). 

To estimate the heterozygosity of Denisova 11, we restrict the anal- 
ysis to transversion polymorphisms to prevent deamination-derived 
substitutions from inflating the estimates, and find 3.7 transversions 
per 10,000 autosomal base pairs. This is over four times higher than the 
heterozygosity of the two Neanderthal (Altai Neanderthal and ‘Vindija 
33.19’) and one Denisovan (Denisova 3) genomes sequenced to date, 
and similar to the heterozygosity seen in present-day Africans. In fact, 
the heterozygosity of Denisova 11 is similar to what would be expected 
if this individual carried one set of chromosomes of Neanderthal 
origin and one of Denisovan origin, as estimated from the number of 
differences between randomly sampled DNA fragments from either 
the Vindija 33.19 or the Altai Neanderthal genome and the Denisova 
3 genome (Fig. 2b and Supplementary Information 5). 

Denisova 11 could have had approximately equal amounts of 
Neanderthal and Denisovan ancestry because she belonged to a pop- 
ulation with mixed Neanderthal and Denisovan ancestry, or because 
her parents were each from one of these two groups. To determine 
which of these two scenarios fits the data best, we considered sites at 
which the genomes of the Altai Neanderthal and Denisova 3 carry a 
transversion difference in a homozygous form. At each of these sites, 
we recorded the alleles carried by two randomly drawn DNA fragments 
from Denisova 11. Note that in 50% of cases, both fragments will come 
from the same chromosome, making 50% of heterozygous sites appear 
homozygous. As a consequence, the expected proportion of apparent 
heterozygous sites is 50% for a first-generation (F)) offspring, whereas 
it is 25% in a population at Hardy-Weinberg equilibrium with mixed 
ancestry in equal proportions (Supplementary Information 6). We find 
that in 43.5% of cases, one fragment from Denisova 11 matches the 
Neanderthal genome and the other matches the Denisovan genome, 
whereas in 27.3% and 29.2% of cases both fragments match the state 
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Fig. 1 | Location of Neanderthals, Denisovans and ancient modern 
humans dated to approximately 40 ka or earlier. Only individuals from 
whom sufficient nuclear DNA fragments have been recovered to enable 
their attribution to a hominin group are shown. Full or abbreviated names 
of specimens are shown near each individual. Blue, Neanderthals; red, 
Denisovans; yellow, ancient modern humans. Asterisks indicate that the 


seen in the Neanderthal or the Denisovan genome, respectively 
(Fig. 2c). For comparison, when a low-coverage Neanderthal genome 
(‘Goyet Q56-1’)’ is analysed in the same way, the two fragments match 
different states in 2.1% of cases, while they both match the Neanderthal 
state in 90.3% of cases and the Denisovan state in 7.5% of cases (Fig. 2c). 

Obviously, the Altai Neanderthal and Denisova 3 are unlikely to be 
identical to the genomes of the individuals that contributed ancestry to 
Denisova 11. To take this into account, we used coalescent simulations 
to estimate the expected proportions of DNA fragments matching a 
Neanderthal or a Denisovan genome in populations with demographic 
histories similar to those of the Altai Neanderthal and Denisova 3 
(Supplementary Information 6). The proportion of cases in which one 
of the two DNA fragments sampled from Denisova 11 matches the 
Neanderthal state and the other the Denisovan state fits the expecta- 
tion for an F, Neanderthal-Denisovan offspring, but not an offspring 
of two F, individuals, an offspring of an F parent and a Neanderthal 
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genome was sequenced to high coverage; individuals with an unknown sex 
are marked with a question mark. Note that Oase 1 has recent Neanderthal 
ancestry (blue dot) that is higher than the amount seen in non-Africans. 
Denisova 3 has also been found to carry a small percentage of Neanderthal 
ancestry. Data were obtained from previous publications’? $1!-1321-74, 


or a Denisovan parent, nor an individual from a population of mixed 
ancestry at Hardy-Weinberg equilibrium (Extended Data Fig. 2 and 
Supplementary Information 6). We conclude that Denisova 11 did not 
originate from a population carrying equal proportions of Neanderthal 
and Denisovan ancestry. Rather, she was the offspring of a Neanderthal 
mother, who contributed her mtDNA, and a Denisovan father. 

We next plotted the distribution of sites across the genome, for 
which Denisova 11 carries an allele matching the Altai Neanderthal 
genome and a different allele matching the Denisova 3 genome. Such 
sites are distributed largely uniformly (Fig. 3), as would be expected 
for an F, offspring of Neanderthal and Denisovan parents. To explore 
the ancestry of the parents of Denisova 11, we looked for regions in 
the genome that deviate from a pattern consistent with Denisova 
11 being an F, offspring (Extended Data Fig. 3). Using four tests 
for enrichment of Denisovan or Neanderthal ancestry, we identify 
at least five approximately 1-Mb long (0.72-0.95 Mb) regions, all of 


a Observed Expected 
b 5 1005 S 
n 
2 
) 4 80- 
Lo 8 = 60-4 | NN 
1.2% 28 & 
co oO MND 
ox oO 2£ 
noe = | 
38.6% 42.3% 38 H 40 mDD 
r ge 1a oS 
S & 
A = 204 
Neanderthal Denisovan Modern 
(Altai) (Denisova 3) human ie) 
Mbuti > 
_ 2S OEE SF F 0 
PY gs eC EF SF , R 
RG & e e e SS & & & 
S GS < y . (en Ss 
& Q aN) & NG x S 
= vs s 2 £ § 
 * $ ¢ 


Fig. 2 | Denisova 11 has both Neanderthal and Denisovan ancestry. 

a, Percentage of DNA fragments from Denisova 11 matching derived 
alleles found on each branch ofa tree relating a Neanderthal, a Denisovan 
and a present-day human genome. b, Distribution of heterozygosity per 
chromosome in two Neanderthals (blue), a Denisovan (red), Denisova 

11 (purple) and present-day humans (n= 235 non-African individuals 
(yellow) and n= 44 African individuals (orange) from a previous 
publication”*), and the expectation for a Neanderthal-Denisovan F; 
offspring (grey). The violins represent the distribution from the minimum 
and maximum heterozygosity values for the autosomes of each archaic 
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hominin and of present-day humans (n = 5,170 pairs of chromosomes 

for non-Africans and n = 968 for Africans). White squares represent 
autosome-wide estimates for the archaic hominins, and the average of 
estimates across individuals for present-day humans. c, Percentage of sites 
at which two sampled DNA fragments both carry Neanderthal-like alleles 
(NN, blue), Denisovan-like alleles (DD, red), or one allele of each type 
(ND, purple); and the expectations for an offspring of a Neanderthal and 
a Denisovan (F), of two F, parents (F2), and of an Fy and a Denisovan 

(F, xD). The expected proportions for simulated Neanderthal and 
Denisovan genomes are shown in Extended Data Fig. 2. 
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Fig. 3 | Distribution of Neanderthal-like and Denisovan-like alleles 
across the Denisova 11 genome. Positions for which one randomly drawn 
DNA fragment matches the Neanderthal genome and another matches the 
Denisovan genome are marked in purple. Positions are marked in blue if 
both DNA fragments match the Neanderthal genome and in red if both 


which are homozygous for Neanderthal ancestry. This suggests that 
the Denisovan father of Denisova 11 had some Neanderthal ancestry. 
Given conservative estimates of the size and number of these regions, 
it is likely that there was more than one Neanderthal ancestor in his 
genealogy, possibly as far back as 300-600 generations before his life- 
time (Supplementary Information 7). Notably, the heterozygosity in the 
regions of Neanderthal ancestry in Denisova 11 is higher than in the 
same regions in the genomes of Vindija 33.19 or the Altai Neanderthal, 
suggesting that the Neanderthals that contributed to the ancestry of 
Denisova 11’s father were from a different population than her mother 
(Supplementary Information 5). 

To explore how the mother of Denisova 11 was related to the two 
Neanderthals that have been sequenced to high coverage to date, we 
evaluated the proportions of fragments from Denisova 11 that match 
derived alleles from either of these two Neanderthal genomes. Denisova 
11 shares derived alleles seen in the Altai Neanderthal genome in 12.4% 
of cases and those present in the Vindija 33.19 genome in 19.6% of 
cases, showing that the Neanderthal mother of Denisova 11 came from 
a population that was more closely related to Vindija 33.19 than to 
the Altai Neanderthal (Supplementary Information 8). We estimate 
the population split times of Denisova 11’s Neanderthal mother from 
the ancestors of the Altai Neanderthal to approximately 20,000 years 
(20 kyr) before the time when the Altai Neanderthal lived, and her 
split time from the ancestors of Vindija 33.19 to around 40 kyr before 
Vindija 33.19. The population split between the Denisovan father of 
Denisova 11 and Denisova 3 is estimated to approximately 7 kyr before 
the latter individual (Supplementary Information 8). In Fig. 4, we pres- 
ent a population scenario that is compatible with these observations as 
well as with the population split times and molecular estimates of the 
ages of the three high-coverage archaic genomes’. We caution that the 
age estimates are associated with uncertainties, for example, regarding 
demography, mutation rates and generation times, and note that addi- 
tional gene flow events are likely to have affected the population split 
times. Nevertheless, that a Neanderthal in Siberia who lived approxi- 
mately 90 ka shared more alleles with Neanderthals who lived at least 
20 kyr later in Europe”’ than with an earlier Neanderthal from the same 
cave® suggests that eastern Neanderthals spread into Western Europe 


match the Denisovan genome. Black lines indicate centromeres. The inset 
shows one region out of five (green boxes) for which both chromosomes 
carry predominantly Neanderthal-like alleles. For comparison, the 
distribution of alleles in this region is shown for a Neanderthal genome 
(Goyet Q56-1). 
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Fig. 4 | Relationships and gene flow events between Neanderthal and 
Denisovan populations inferred from genome sequences. Diamonds 
indicate ages of specimens estimated via branch shortening’; circles 
indicate population split times estimated from allele sharing between 
Denisova 11 and the high-coverage Neanderthal and Denisovan genomes 
(blue and red) and among the three high-coverage genomes (yellow, from 
a previous publication”). Markers indicate the means of these estimates, 
error bars indicate 95% confidence intervals based on block jackknife 
resampling across the genome (n = 523 blocks). Note that the confidence 
intervals do not take the uncertainty with respect to population size, 
mutation rates or generation times into account. Ages before present are 
based on a human-chimpanzee divergence of 13 million years”**. The 
arrow indicates Neanderthal gene flow into Denisovans. 
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sometime after 90 ka or that western Neanderthals spread to Siberia 
before that time and partially replaced the local population. These two 
non-mutually exclusive hypotheses could be tested by sequencing the 
genomes of early Neanderthals from Western Europe. 

In conclusion, the genome of Denisova 11 provides direct evi- 
dence for genetic mixture between Neanderthals and Denisovans 
on at least two occasions: once between her Neanderthal mother 
and her Denisovan father, and at least once in the ancestry of her 
Denisovan father. Therefore, of the six individuals from Denisova Cave 
from whom nuclear DNA is available>®*!)!?, two (Denisova 3 and 
Denisova 11) show evidence of gene flow between Neanderthals and 
Denisovans. We note that of the three genomes”! retrieved from 
modern humans who lived at a time when Neanderthals were present 
in Eurasia (that is, approximately 40 ka or earlier)’, one individual— 
‘Oase 1’—had a Neanderthal ancestor four to six generations back in 
his family tree?’. 

It is notable that one direct offspring of a Neanderthal and a 
Denisovan (Denisova 11) and one modern human with a close 
Neanderthal relative (Oase 1) have been identified among the few 
individuals from whom DNA has been retrieved and who lived at the 
time of overlap of these groups (Fig. 1). In conjunction with the pres- 
ence of Neanderthal and Denisovan DNA in ancient and present-day 
people*>*!3:1617.25-27, this suggests that mixing among archaic 
and modern hominin groups may have been frequent when they 
met. However, Neanderthals inhabited western Eurasia'° whereas 
Denisovans inhabited yet unknown parts of eastern Eurasia>!”. Thus, 
their zones of overlap may have been restricted in space and time. This, 
as well as possibly reduced fitness of individuals of mixed ancestry, 
may explain why Neanderthals and Denisovans remained genetically 
distinct. By contrast, the spread of modern humans across Eurasia after 
around 60,000 years ago may have allowed repeated interactions with 
archaic groups over a wider spatial range. Admixture between them 
may have resulted in archaic populations becoming partly absorbed 
into what were probably larger modern human populations®*. 


Online content 

Any methods, additional references, Nature Research reporting summaries, source 
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METHODS 


Sampling and pre-treatment of bone powder. An overview of the laboratory 
experiments is shown in Extended Data Table 1. Bone powder was removed from 
the specimen using disposable sterile dentistry drills after the removal of a thin 
layer of surface material. Six samples were collected, each consisting of approxi- 
mately 30 mg of bone powder. Because a previous analysis of the bone revealed that 
it is contaminated with present-day human DNA%, each sample of bone powder was 
incubated with 1 ml 0.5% sodium hypochlorite solution as previously described’® 
and as indicated in Extended Data Table 1, to reduce the amounts of present-day 
human and microbial DNA”. Residual sodium hypochlorite was removed by 
three consecutive 3-min washes with 1 ml water!®. One extraction negative control 
(no powder) was included in each set of extractions. 

DNA extraction and DNA library preparation. DNA was extracted using silica 
columns’ as previously described”, and eluted in 50 jl 10 mM Tris-HCl, 1 mM 
EDTA, 0.05% Tween-20, pH 8.0. Subsequently, 10,11 of each DNA extract (includ- 
ing the extraction negative controls) were used to prepare single-stranded DNA 
libraries as previously described!*”®. A library preparation negative control was 
included in every experiment. Two additional 5-11 aliquots from extracts E3652 
and E3655 were used to generate additional libraries (library preparation setup 
C in Extended Data Table 1), resulting in a total of 10 DNA libraries. The num- 
ber of DNA molecules in the libraries was estimated by digital droplet PCR? or 
quantitative PCR”. Each library was amplified to the plateau while incorporating 
a pair of unique indexes*! using 1 1M primers’! and AccuPrime Pfx DNA poly- 
merase (Life Technologies)**. Amplification products were purified using the 
MinElute PCR purification kit (Qiagen) or SPRI technology** ona Bravo NGS 
workstation (Agilent Technologies) as previously described*4. Indexed DNA 
libraries were pooled with libraries from other projects. Heteroduplices, which 
confound DNA separation and concentration measurements in chromatography, 
were removed from the pools by single cycle amplification using Herculase 
II Fusion DNA polymerase (Agilent Technologies)” with primers IS5 and IS6*. 
Prior to deeper sequencing of libraries R5507, R5509, R9880, R9881, R9882, R9I883 
and R9873, heteroduplices were removed from each library separately. The concen- 
tration of DNA in each pool or each individual library, respectively, was determined 
using the electrophoresis system implemented on the DNA-1000 chip (Agilent 
Technologies). 

Sequencing and data processing. Sequencing was performed on Illumina 
platforms (MiSeq or HiSeq 2500) using 76-cycle paired-end runs adapted to 
double-indexed libraries*!. Bases were called using Bustard (Illumina). Adaptor 
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sequences were trimmed and overlapping paired-end reads were merged into sin- 
gle sequences using leeHom**. Demultiplexing was carried out using jivebunny’. 
Sequences generated from a given library were merged using SAMtools*” and 
aligned to the human reference genome (hg19/GRCh37) with the decoy sequences 
as previously described? using BWA** with parameters adjusted to ancient DNA®. 
PCR duplicates were collapsed using bam-rmdup (https://bitbucket.org/ustenzel/ 
biohazard) and DNA fragments of length >35 bases that mapped within regions of 
unique mappability (Map35_100% from a previous publication’) with a mapping 
quality of 25 or higher’ were used for analyses. Further filtering criteria used for 
certain analyses are described in the Supplementary Information. 

Reporting summary. Further information on research design is available in 
the Nature Research Reporting Summary linked to this paper. 

Code availability. The computer code used for simulations is included in 
Supplementary Information 6. 

Data availability. Sequences generated from Denisova 11 have been deposited in 
the European Nucleotide Archive under study accession number PRJEB24663. 
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Extended Data Fig. 1 | Comparison between cortical thickness of long from the Bronze Age and two Neanderthals compared to the minimum 
bones from modern humans, Neanderthals and Denisova 11. Maximum __ thickness of Denisova 11 (dashed line). 
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Extended Data Fig. 2 | Comparison of the genome of Denisova 11 a, Percentages calculated for two random DNA fragments from Denisova 
and simulated genomes. Percentage of sites at which Denisova 11 11 and from simulated F,, F, Neanderthal (NF) or Denisovan (DF) 
and genomes simulated under the demographic model described in genomes. b, Proportions of sites for the simulated genotypes, before 
Supplementary Information 6 carry two Neanderthal alleles (NN, blue), sampling two fragments. 


two Denisovan alleles (DD, red) or one allele of each type (ND, purple). 
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Extended Data Table 1 | DNA extracts and DNA libraries prepared from the Denisova 11 specimen 


Input DNA Unique Average 
Bone Pre- Library in DNA fragments Mapped Mapped fragments fragment Fragments 
Extr. powder treatment Extract prep. library Molecules Indexed fragments sequenced fragments fragments (L235, MQ 225, length with C to T 
set [mg] [minutes] ID setup [ul] in library library ID sequenced (L235) (L235, MQ 225) (%) Map35_100%) [bp] substitution 
27.4 15 E3259 A 10 2.27E+08 R5507 133,898,498 89,793,496 2,139,377 24 1,656,500 52.7 404,188 
1 
27.8 15 E3261 A 10 2.00E+08 R5509 145,234,847 94,543,170 1,712,750 18 1,201,280 48.8 329,411 
B 10 4.33E+08 * R5780 2,391,986 1,565,197 171,150 10.9 152,336 56.4 31,758 
29.0 15 E3652 Cc 5 4.63E+08 R9880 379,368,999 228,704,750 22,501,299 9.8 14,767,988 56.5 3,028,792 
Cc 5 4.03E+08 R9881 333,009,774 203,041,282 20,747,195 10.2 13,805,425 56.6 2,850,253 
Denisova 11 
2 29.7 15 E3654 B 10 4.16E+08 * R5782 2,671,910 1,669,048 81,750 49 72,730 54.5 15,618 
B 10 3.49E+08 * R5783 2,348,997 1,510,249 199,762 13.2 177,860 59.3 31,952 
33.5 15 E3655 Cc 5 4.19E+08 R9882 368,237,790 225,412,495 27,395,573 12.2 17,849,890 59.8 3,173,678 
Cc 5 3.69E+08 R9883 343,471,978 224,455,462 28,522,051 12.7 18,048,369 60.4 3,215,383 
3 27.1 30 E3922 Cc 10 7.43E+07 R9873 348,156,224 222,947,600 62,160,161 27.9 17,009,638 53.3 4,282,134 
ENC. 15 E3262 A 10 2.55E+07 R5510 12,220 4,123 38 0.9 35 55.2 73 
1 
LNC - - A - 7.60E+06 R5521 12,444 2,170 11 0.5 10 47.2 x 
ENC 15 E3663 B 10 1.83E+06 * R5791 32,008 4,183 473 11.3 412 51.2 8 
Controls 2 
LNC - - B - 2.54E+06 * R5792 31,455 2,908 70 24 58 49.0 3 
ENC. 30 E3926 Cc 10 2.73E+07 R9877 61,825 13,861 2,472 17.8 2,145 57.0 9 
3 
LNC - - c - 1.30E+07 R9888 68,130 5,275 100 19 67 46.2 6 


Data are shown by DNA extraction set, and libraries prepared in the same setup are denoted with the same letter (A, B or C). Relevant negative controls are marked in grey. The number of molecules 
in each library was quantified by digital droplet PCR or quantitative PCR (denoted by asterisk). The numbers of DNA fragments sequenced per library are indicated for the combined data from all 
sequencing runs. Mapped fragments were counted if they were at least 35 bases long and mapped to the human reference genome with a mapping quality of 25 or higher; and their percentage was 
calculated out of sequenced fragments of length 35 bases or more. Following the removal of PCR duplicates, unique DNA fragments were retained if they mapped to the reference genome within the 
used mappability track. Such fragments were considered to contain a terminal cytosine (C) to thymine (T) substitution relative to the human reference genome if a putative cytosine deamination was 
within the first three or last three bases of the strand. bp, base pairs; ENC, extraction negative control; Extr., extraction; L, length; LNC, library preparation negative control; Map35_100%, previously 
published mappability track®; MQ, mapping quality; Prep., preparation. 
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Past experience shapes sexually dimorphic neuronal 
wiring through monoaminergic signalling 


Emily A. Bayer! & Oliver Hobert!* 


Differences between female and male brains exist across the 
animal kingdom and extend from molecular to anatomical 
features. Here we show that sexually dimorphic anatomy, gene 
expression and function in the nervous system can be modulated 
by past experiences. In the nematode Caenorhabditis elegans, 
sexual differentiation entails the sex-specific pruning of synaptic 
connections between neurons that are shared by both sexes, giving 
rise to sexually dimorphic circuits in adult animals!. We discovered 
that starvation during juvenile stages is memorized in males to 
suppress the emergence of sexually dimorphic synaptic connectivity. 
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Red: hermaphrodite-specific chemical 
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Dimorphic connectivity region 


a Black/grey: sex-shared neurons; white: sex-specific neurons 


Blue: male-specific chemical synapse 
between shared neurons 


These circuit changes result in increased chemosensory 
responsiveness in adult males following juvenile starvation. We 
find that an octopamine-mediated starvation signal dampens the 
production of serotonin (5-HT) to convey the memory of starvation. 
Serotonin production is monitored by a 5-HT1A serotonin receptor 
homologue that acts cell-autonomously to promote the pruning of 
sexually dimorphic synaptic connectivity under well-fed conditions. 
Our studies demonstrate how life history shapes neurotransmitter 
production, synaptic connectivity and behavioural output in a 
sexually dimorphic circuit. 


Fig. 1 | Starvation inhibits male-specific 
synaptic pruning. a, Schematic of adult 
chemical synaptic connectivity based on 
electron micrograph reconstruction™. Left, 
location of dimorphic synaptic connections 
within animal. mMN, male motor neuron; 
mIN, male interneuron. b, Hermaphrodite- 
specific PHB > AVA and PHA > AVG 
synaptic connections are maintained in 
post-dauer adult males as quantified by 
GFP reconstitution across synaptic partners 
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Fig. 2 | Juvenile starvation in males results in aberrant maintenance of 
juvenile behaviour. a, Males show enhanced chemosensory avoidance 
behaviour following L1 starvation. Left, predicted synaptic input into 
avoidance behaviour’. All experiments used tax-4 (p678) mutants to 
disable amphid input, as previously described°. Control animals are non- 
starved siblings of starved animals. Each dot represents the reversal index 
of one animal over ten experimental trials; vertical magenta bar is median. 
n=number of animals, shown in a, b. P values, two-sided Wilcoxon rank- 
sum test with Bonferroni corrections for multiple testing (see Methods) and 


During sexual maturation, nervous systems develop a number of sex- 
ually dimorphic features. We investigated whether sexually dimorphic 
maturation of the nervous system is modulated by early-life experience 
in the two sexes of C. elegans, male and hermaphrodite”. Among the 
most notable sexual dimorphisms in C. elegans are sex-specific patterns 
of synaptic connectivity>*. These sex-specific wiring differences arise 
during sexual maturation (the L4 stage) via sex-specific pruning of 
‘sex-hybrid’ juvenile connectivity!“ (Fig. 1a). We found that early-life 
experience alters sex-specific synaptic pruning. Specifically, adult 
males that passed as sexually immature juveniles through a develop- 
mental arrest stage induced by unfavourable external conditions 
display no sex-specific pruning of the normally hermaphrodite-specific 
PHB > AVA and PHA > AVG synaptic connections (Fig. 1b, Extended 
Data Fig. 1a, b). By contrast, normally male-specific connections are 
pruned in hermaphrodites that passed as juveniles through this dauer 
stage (Fig. 1c). These observations suggest a sexually dimorphic sensitiv- 
ity to the memory of environmental stress. While a number of stressors 
had no effect on male-specific synapse pruning (Extended Data 
Fig. 1c), starvation of males (but not hermaphrodites) at early juvenile 
stages fully recapitulated the effect of dauer passage on male-specific 
synapse pruning (Fig. 1d, Extended Data Fig. 1d). By contrast, star- 
vation during the L4 stage did not result in defects in male-specific 
synaptic pruning (Fig. 1d). Synaptic pruning is first observable in the 
L4 stage’, demonstrating that starvation prevents pruning if the stress 
occurs before its onset, but does not halt or reverse synaptic pruning. 

The PHB phasmid sensory neurons modulate the avoidance response 
to noxious chemicals (for example, sodium dodecyl sulfate (SDS)) via 
synapses onto the AVA command interneurons in juvenile animals of 
both sexes and in adult hermaphrodites; in adult males, pruning of the 
PHB > AVA synaptic contacts eliminates this behavioural output?®. 
Juvenile-starved adult males retain the ability to respond to SDS, as pre- 
dicted by the failure to prune PHB > AVA (Fig. 2a). By contrast, males 
starved during adulthood (after pruning has already occurred) do not 
show any change in SDS avoidance behaviour (Fig. 2a), consistent 
with normal PHB > AVA pruning. Thus, a starvation experience allows 
males to retain juvenile sensory acuity and results in the loss of an adult 
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Controln = 29 
L1 starved n = 30 
95% confidence intervals. Neither L1-starved nor continuously fed adult 
males change between adulthood days 1 and 2 (Extended Data Fig. 2a). 
b, Males show mating defects. Each step of mating (schematized on left) 
was scored in adult males following L1 starvation (grey) and compared 
to progeny of the starved generation (black). For behaviours to the left 
of the red bar, efficiency is shown as success rate per animal. For loss of 
hermaphrodite contact, efficiency is shown as failure rate per animal. Data 
shown as mean + s.d. Each dot represents one animal (overlapping points 
graphically omitted). P values, two-sided Wilcoxon rank-sum test. 


behavioural sexual dimorphism. However, maintenance of juvenile 
sensory acuity affects other behaviours in the adult male. Adult males 
differ from adult hermaphrodites in that they search for, and mate 
with, hermaphrodites®’. Following juvenile starvation, males display 
a defect in maintaining contact with hermaphrodites during mating, a 
phenotype previously associated with ablation of AVA (the command 
interneuron postsynaptic to both PHA and PHB)® (Fig. 2b, Extended 
Data Fig. 2b). Overall, adverse juvenile experience prevents male-spe- 
cific synaptic pruning and increases noxious sensory responsiveness, 
but also decreases male mating efficiency. 

In C. elegans, several feeding-dependent behaviours, such as locomo- 
tion and aversive memory, are known to be modulated by the monoam- 
ines serotonin and octopamine: serotonin signals well-fed conditions 
(similar to vertebrates) and octopamine (similar to norepinephrine in 
vertebrates”) signals starvation'*!. To investigate whether monoamine 
signalling regulates male-specific synaptic pruning, we supplemented 
well-fed animals with exogenous octopamine during the L1 stage 
and found that this mimicked starvation, suppressing male-specific 
synaptic pruning of the PHB > AVA and PHA > AVG connections 
(Fig. 3a, Extended Data Fig. 3a). Conversely, starving L1 animals in 
the presence of exogenous serotonin showed normal male pruning of 
the PHB > AVA and PHA > AVG connections, as well as retaining the 
SDS avoidance behavioural dimorphism (Figs. 2a, 3a). Consistent with 
this result, pruning was also rescued in males that were starved in the 
presence of the selective serotonin reuptake inhibitor fluoxetine or in 
mod-5/SERT mutants, which show increased extracellular serotonin!” 
(Fig. 3b, Extended Data Fig. 3b, c). No pruning defects were observed 
in males defective for dopamine production (Extended Data Fig. 3d). 

The effects of exogenous octopamine and serotonin on pruning sug- 
gest that monoamine signalling is involved in regulating male-specific 
synaptic pruning, but do not address the requirement for endogenous 
octopamine and serotonin. Therefore, we measured the expression of 
the rate-limiting enzymes of octopamine and serotonin synthesis. In 
larval C. elegans, octopamine is produced exclusively in the sex-shared 
RIC interneuron class by tyramine-$-hydroxylase (encoded by tbh-1)"?. 
Consistent with our results with exogenous octopamine treatment, 
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Fig. 3 | Serotonin and octopamine convey feeding and starvation signals via 
the ADF neurons. a, Octopamine and 5-HT mimic starvation and feeding, 
respectively. Quantification of PHB > AVA and PHA > AVG 

synaptic connectivity in adults of both sexes using GRASP or iBLINC 

(a, b, e, f). Each dot represents the number of synaptic puncta in one animal; 
blue bars, median, black boxes, quartiles; vertical black lines, range 

(a, b, e, f). 2 =number of animals, shown in each column (a-f). P values, two- 
sided Wilcoxon rank-sum test with Bonferroni corrections (see Methods) 

for multiple testing (a-f). Representative images shown (Extended Data 

Fig. 3a). b, mod-5 mutations rescue pruning defects in PHB > AVA and 

PHA > AVG connections in L1-starved adult males. Representative images 
shown (Extended Data Fig. 3c). c, Starvation induces octopamine production. 
Expression of a tbh-1 reporter in RIC in fed or starved L1 animals. Heat-map 
rendered fluorescence intensity images above, quantification below. 
starvation at either the adult'* or L1 stage (Fig. 3c) increases transcrip- 
tion of tbh-1 (Fig. 3c, Extended Data Fig. 4a). The expression of the 
serotonin-synthesizing enzyme tryptophan hydroxylase (encoded by 
tph-1) is also dependent on starvation, at least in adult animals!5, with 
an octopamine receptor conveying the starvation signal’®. At the first 
larval stage, tph-1 is expressed exclusively in two sex-shared classes of 
serotonin-producing neurons, the bilateral NSM and ADF neuron 
pairs'’. Under well-fed conditions, a tph-1 fosmid reporter was 
dimorphically expressed in both NSM and ADE with L1 males showing 
higher expression than hermaphrodites (Fig. 3d, Extended Data Fig. 4b). 
Upon starvation during L1, tph-1 expression decreases acutely in male 
ADF neurons, but not NSM neurons (Fig. 3d, Extended Data Fig. 4b). 
Addition of exogenous octopamine to otherwise well-fed animals 
mimicked the effect of starvation on fph-1 transcription in male ADF 
neurons (Fig. 3d), whereas the addition of tyramine did not (Extended 
Data Fig. 4c). Confirming the effects observed with a tph-1 reporter 
transgene, single-molecule fluorescent in situ hybridization (smFISH) 
of endogenous tph-1 mRNA also revealed sexually dimorphic expres- 
sion in ADF neurons, as well as a transcriptional decrease (in ADF, but 
not NSM) following starvation in L1 (Extended Data Fig. 4d). 

Unlike the upregulation of the octopamine-producing tbh-1 gene, 
downregulation of tph-1 reporter expression persisted even after animals 
were returned to food (Extended Data Fig. 4a, e). Furthermore, in con- 
tinuously fed animals, tph-1 expression peaks during the L3 stage in 
both sexes (the onset of sexual maturation), whereas tbh-1 expression is 
unchanged over the course of development (Extended Data Fig. 4a, e). 
Notably, this upregulation never occurs in L1-starved animals, and this 


Scale bars, 10|1m. Magenta bar, median; black boxes, quartiles (c, d). 
Anterior left, dorsal up in all figures. d, tph-1 transcription in ADF neurons 

is increased in males and decreased by starvation or exogenous octopamine. 
Expression of a tph-1 transcriptional fosmid in ADF in fed L1 animals, starved 
L1 animals or L1 animals fed in the presence of exogenous octopamine. 

e, Overexpression of tph-1 in ADF mimics the rescuing effect of exogenous 
5-HT. Quantification of PHB > AVA and PHA > AVG synaptic connectivity 
using GRASP and iBLINC. Two independent transgenic lines were tested for 
each experiment. L1-starved animals without transgenic lines are siblings of 
transgenic animals and controls are non-starved adult males with transgenic 
arrays. Representative images shown (Extended Data Fig. 6c). f, Genetic 
ablation of ADF results in failure to prune. Quantification of PHB > AVA and 
PHA > AVG synaptic connectivity in both sexes in lim-4 mutants”. 


starvation memory is dependent on tbh-1 and, thus, on octopamine 
production (Extended Data Fig. 5a). We also observed decreased tph-1 
transcription in ADF but not NSM during the L3 stage in SER-6 
octopamine receptor mutants, corroborating the relevance of octo- 
pamine signalling for regulating serotonin levels during sexual mat- 
uration (Extended Data Fig. 5b). Conversely, exogenous serotonin 
did not suppress the upregulation of tbh-1 (Extended Data Fig. 5c). 
Demonstrating the functional relevance of persistent serotonin down- 
regulation, male-specific pruning defects in L1-starved animals were 
rescued by supplementation of exogenous serotonin during the L3 stage 
(after the animals had been returned to food; Extended Data Fig. 6a). We 
conclude that a temporary increase in starvation-induced octopamine 
production triggers a long-term alteration in serotonin production. 

To further investigate whether levels of serotonin production in the 
sex-shared ADF neuron regulate sex-specific synaptic pruning during 
male sexual maturation, we also used neuron-specific promoters to 
overexpress tph-1 in NSM or ADF during L1 starvation. Overexpression 
of tph-1 in ADF, but not in NSM, rescued sex-specific pruning of 
PHB > AVA and PHA > AVG synapses in males that had been starved 
(Fig. 3e, Extended Data Fig. 6b, c). Through selective removal of sero- 
tonin production from either NSM or ADF neurons (using cell-specific 
tph-1 mutant rescue experiments and genetic elimination of either NSM 
or ADF; Fig. 3f, Extended Data Fig. 6d, e), we found that serotonin 
production specifically in ADF is not only sufficient but also required 
to modulate neuronal male-specific synaptic maturation. This anal- 
ysis also revealed that NSM-secreted serotonin has an earlier role in 
phasmid neuron migration and morphology (Extended Data Fig. 6d). 
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Fig. 4 | The ser-4/5-HT1A serotonin receptor acts downstream of the 
feeding signal. a, Quantification of PHB > AVA and PHA > AVG synaptic 
connectivity in adult males mutant for serotonin receptors. Each dot 
represents one animal, blue bar represents median, black box represents 
quartiles and vertical black bars represent range (a, c). n = number of 
animals, shown in each column (a-c). P values, two-sided Wilcoxon rank- 
sum test with Bonferroni corrections for multiple testing (see Methods) 
(a-c). b, ser-4 smFISH puncta are present in PHB in both sexes at L1. 
Maximum intensity projections of one half of each animal are shown for 


To investigate how serotonin signals from the ADF neurons in the 
head to the phasmid-interneuron connections in the tail, we analysed 
PHA > AVG and/or PHB > AVA connectivity in males mutant for each 
of the four metabotropic serotonin receptors in C. elegans'”®. In well-fed 
males lacking the ser-4/5-HT1A receptor, juvenile synaptic connec- 
tivity was maintained, phenocopying juvenile starvation (Fig. 4a, 
Extended Data Fig. 7a). This phenotype was neither enhanced by juve- 
nile starvation nor rescued by exogenous serotonin in ser-4 mutants, 
suggesting that the ser-4 serotonin receptor acts directly and non- 
redundantly downstream of the feeding cue (Extended Data Fig. 8a). 
Furthermore, tph-1 transcription levels were unaffected in the ser-4 
mutant, confirming that ser-4 does not feed back onto serotonin 
production itself (Extended Data Fig. 8b). 

In both C. elegans and vertebrates, ser-4/5-HT1A responds to extra- 
synaptic serotonergic signalling'®”°, raising the possibility that seroto- 
nin from ADF neurons might directly modulate phasmid connectivity. 
Using smFISH probes against ser-4 mRNA, we identified ser-4 tran- 
scripts in several neurons, including PHB but not PHA neurons, and 
detected no signals in ser-4 mutants (confirming probe specificity) 
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ser-4 smFISH and DAPI panels; the centre slice from the stacks is shown 
in the GFP panel. Merge shows overlay of ser-4 smFISH onto DAPI. Each 
grey dot represents the normalized number of ser-4 smFISH puncta in one 
PHB neuron, red bar indicates median and box indicates quartiles. Higher- 
magnification individual z-slices are shown in Extended Data Fig. 7b. 

c, Expression of ser-4 cDNA in PHB or PHA rescues the PHB > AVA and 
PHA > AVG pruning defects, respectively. Two independent transgenic 
lines were evaluated for each promoter. d, Summary of effects of feeding 
(left) and starvation (right). 


(Fig. 4b, Extended Data Fig. 7b). To determine whether ser-4 exerts 
its role neuron-autonomously within the phasmid sensory circuit, we 
performed cell-specific rescue experiments. We found that AVA-driven 
ser-4 was unable to rescue PHB > AVA pruning in ser-4 mutants, but 
that expression of ser-4 in PHB rescued the pruning defect, consistent 
with a cell-autonomous function (Fig. 4c, Extended Data Fig. 7a). 
Serotonin signalling can also act non-cell-autonomously in the 
phasmids to modulate synaptic pruning. Specifically, we found that 
PHB-expressed ser-4 rescued pruning of the PHA > AVG connection 
(Fig. 4c, Extended Data Fig. 7a). Moreover, although we found that 
ser-4 was expressed in the PHB but not PHA neurons, expression 
of ser-4 in PHA rescued the PHA > AVG pruning defect (Fig. 4c, 
Extended Data Fig. 7a). The extensive, male-specific gap junctions 
that connect the PHA and PHB neurons** may allow PHA and PHB 
to readily cross-communicate a signal from a serotonin receptor. 
Early-life stress is known to have long-lasting effects in verte- 
brates!, and this also involves serotonin signalling”. For example, 
prenatal stress in mouse models of human serotonin transporter 
variants affects adult memory, anxiety and depressive-like behaviour”. 
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Yet although both behavioural effects and some molecular effects of 
early-life stress in vertebrates have been identified, it has been diffi- 
cult to link specific molecular changes to corresponding behavioural 
outcomes, a notion further confounded by complex contributions of 
genetic background”. Here we show that a juvenile starvation stress 
results in lasting circuit and behavioural effects in adult male, but 
not hermaphrodite, C. elegans by affecting serotonin levels during 
sexual maturation (Fig. 4d). We find that serotonin normally signals 
extrasynaptically to act as a cue for male-specific synaptic pruning. 
These results provide insight into both how temporary early-life stress 
can result in specific lasting changes to the nervous system and how 
stress can intersect with sexual maturation and result in differential 
effects between the two sexes, a feature that has also been observed 
in vertebrate models*!*. We anticipate that still-unknown genetic 
components underlying the many differences in sexual maturation 
between the two sexes of both C. elegans and vertebrates will shed 
light on the sexually differential sensitivity to juvenile starvation and 
other early-life stress. 
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METHODS 


Strains. Wild-type strains were C. elegans variety Bristol, strain N2. Worms were 
maintained by standard methods”. Worms were grown at 20°C on nematode 
growth media (NGM) plates seeded with bacteria (Escherichia coli OP50) as a 
food source. GRASP and iBLINC reagents have been previously described!°?”, A 
detailed list of all mutant and transgenic strains used is available in Supplementary 
Table 1. 

Cloning and constructs. To generate pEAB42 (srg-13p::ser-4::SL2::tagRFP), ser-4 
cDNA was amplified off pEAB69 (ser-4 cDNA in pUC57) and inserted into pEAB6 
(srg-13p::fem-3::SL2::tagRFP) using restriction-free cloning to replace the fem-3 
cDNA. To generate pEAB43 (gpa-6p::ser-4::SL2::tagRFP), pEAB59 (flp-18p::ser- 
4::SL2::tagRFP), and pEAB60 (inx-18p::ser-4::SL2::tagRFP), the srg-13 promoter 
was digested out of the SphI-Xmal sites of pEAB42 and each promoter was ligated 
in (from pEABI (2.2 kb upstream of srg-13 fused to wCherry), pEAB3 (2.6 kb of 
the gpa-6 promoter fused to GFP), pEAB10 (3.1 kb of the flp-18 promoter fused to 
GFP), and pMO10 (intron 2 of inx-18 with the AIY enhancer site deleted! fused 
to wCherry). 

pKA805 (psrh-142::tph-1::GFP) and pKA807 (pceh-2::tph-1::GFP) were gifts 
from K. Ashrafi. 

Microscopy. Worms were anaesthetized using 100 mM sodium azide (NaN3) and 
mounted on 5% agar on glass slides. Worms were analysed with Nomarski optics 
and fluorescence microscopy, using a Zeiss 880 confocal laser-scanning micro- 
scope. Multidimensional data were reconstructed as maximum intensity projec- 
tions using Zeiss Zen software. For GRASP experiments, animals were imaged 
using a 63x objective and puncta were quantified by scanning the original full 
Z-stack for distinct dots in the area where the processes of the two neurons over- 
lap. GRASP experiments were scored blinded to genotype for mutant analysis 
and rescue array analysis, and experimental condition for starvation, heat stress, 
hyperosmotic stress and serotonin rescue experiments. For fluorescence intensity 
experiments, animals were imaged using a 40x objective with fixed imaging set- 
tings and quantification was performed using the Zeiss Zen software by measuring 
the mean fluorescent intensity of the neuronal cell body in the centre Z-slice of 
each neuron, and then averaging the intensity of the left and right neurons of each 
pair to control for differences based on Z-position. The sex of L1 animals was 
determined using rectal epithelial cell morphology and/or coelomocyte position. 
Figure preparation. Plots for GRASP and expression data were generated in R 
using the beeswarm package. Figures were prepared using Adobe Photoshop CS6 
and Adobe Illustrator CS6. 

Statistics and reproducibility. Two-tailed Wilcoxon rank-sum tests were per- 
formed in R in addition to post hoc Bonferroni corrections to adjust P values for 
number of pairwise tests in all cases where more than two pairwise statistical tests 
were performed. Freeman-Halton extension of one-sided Fisher exact tests were 
performed for the categorical data in Extended Data Fig. 6d. All experiments were 
repeated independently (technical replicates) a minimum of twice with similar 
results; any case for which replication failed is indicated in the corresponding figure 
legend. Within each figure, all datasets contain only representative biological 
replicates. No statistical method was used to determine sample size and animals 
were not randomized into trial groups. 

Starvation and neurotransmitter assays. L1 starvation assays were performed by 
plating a synchronized population of embryos (following hypochlorite treatment of 
gravid adults) onto unseeded NGM and allowing 12 h at 20°C for embryo hatching, 
followed by 24 h at 20°C for starvation (unless otherwise noted) before transfer 
onto seeded NGM by washing L1 animals using M9 buffer. Later starvation assays 
(L3, L4, adult) were performed by washing synchronized populations grown at 
20°C off seeded NGM plates using M9 buffer, performing at least 3 washes in 
M9 at 510rcf to remove OP50, and then plating animals on NGM plates without 
bacto-peptone to prevent the growth of any residual OP50 for 24 h before transfer 
back to seeded NGM. 

For neurotransmitter assays, all drugs were mixed into NGM medium (for 
serotonin starvation assays, without bacto-peptone) before pouring to a final 
concentration of 20 mg/ml (octopamine), 5 mM (serotonin), 5 mM (tyramine) or 
0.1 mg/ml (fluoxetine). Drugs used were 5-hydroxytryptamine hydrochloride 
(Sigma-Aldrich catalogue #H9523), (1)-octopamine hydrochloride (Sigma- 
Aldrich catalogue #00250), tyramine hydrochloride (Sigma-Aldrich catalogue 
#T2879), and fluoxetine hydrochloride, USP (Spectrum Chemical catalogue 
# F1200). L1 animals were synchronized onto drug-containing plates identically 
to starvation assays (above) and then transferred onto plain seeded NGM plates 
24 h post-hatching. 

LI heat stress assay. Heat stress assays were performed by transferring synchro- 
nized, well-fed L1 animals (6 h post-hatching) to a 35°C incubator for 30 min, and 
then recovering plates at 20°C until adulthood. 

L1 osmotic stress assay. Hyperosmotic stress assays were performed by growing 
L1 animals on plates containing 200 mM NaCl for 24 h after hatching (in the 


continuous presence of food) and then transferring animals to standard NGM 
plates until adulthood. 

SDS-avoidance behaviour. The SDS avoidance assay was based on procedures 
as described®. A small drop of solution containing either the repellent (0.1% SDS 
in M13 buffer) or buffer (M13 buffer: 30 mM Tris-HCl pH 7.0, 100 mM NaCl, 
10 mM KC)) is delivered near the tail of an animal while it moves forward. Once in 
contact with the tail, the drop surrounds the entire animal by capillary action and 
reaches the anterior amphid sensory organs. The drop was delivered using 10-1l 
glass calibrated pipets (VWR international) pulled by hand on a flame to reduce 
the diameter of the tip. The capillary pipette was mounted in a holder with rubber 
tubing and operated by mouth. Assayed worms were transferred individually to 
fresh non-wet unseeded NGM plates. Each assay started by testing the animals with 
drops of M13 buffer alone. The response to each drop was scored as reversing or 
not reversing. The avoidance index is the number of reversal responses divided 
by the total number of trials. An interstimulus interval of at least two minutes was 
used between successive drops to the same animal. Each animal was tested ten 
times, and all animals that survived all ten trials were included in the datasets. 
Two biological and technical replicates were performed for each experiment. Each 
replicate began with n = 40 animals for each condition. 

Mate-searching behaviour. The male-leaving assay was based on procedures as 
described®, Males were separated from hermaphrodites at the L4 stage and transferred 
to assay plates 24 h later. Assays were performed on NGM plates (5 cm diameter) 
and seeded with 20 11 OP50 to create a small lawn that was allowed to grow over- 
night. Distance from the lawn was recorded every 3 h over a total assay period 
of 12 h, and then mean distance from the lawn was calculated for each assayed 
male. Two biological and technical replicates were performed for each experiment. 
Mating behaviour assays. Mating assays were based on procedures as described’. 
Males were picked at the L4 stage and kept apart from hermaphrodites for 24 h, 
either following 24 h of starvation during L1 or not. One male was transferred to 
a plate covered with a thin fresh OP50 lawn containing 10-15 adult unc-31 (e928) 
hermaphrodites. These hermaphrodites move very little, allowing easy recording 
of male behaviour. Hermaphrodites were also isolated from the opposite sex at the 
L4 stage and used 24 h later, and were always well-fed. Animals were monitored 
and the sequence of events was recorded within a 10 min window or until the male 
ejaculated, whichever occurred first. Males were digitally recorded using the Exo 
Labs model 1 camera mounted on Nikon Eclipse E400 compound microscope with 
long-distance x20 lenses. Per cent efficiency (per mating behaviour step) = 100 x 
(number of successful performances of mating behaviour step/number of times 
the male attempted the mating behaviour step). Per cent loss of hermaphrodite 
contact = 100 x (number of times male lost contact with hermaphrodite and 
eventually re-initiated the mating sequence/ number of hermaphrodites the male 
contacted during assay). Two biological and technical replicates were performed 
for each experiment. 

Single molecule fluorescent in situ hybridization (smFISH). smFISH was 
performed as described”*. In brief, L1 larvae (6 h post-hatching on food for control 
or 24h starved for L1 starved) were washed off NGM and fixed (4% PFA) for 45 min 
at room temperature. Fixed worms were washed twice with PBS, resuspended in 
70% ethanol, and incubated over two nights at 4°C. The fixed sample was centri- 
fuged and incubated with wash buffer for 5 min. After wash buffer was removed, 
hybridization buffer containing the ser-4 or tph-1 and ric-4 probes (designed using 
the Stellaris RNA FISH probe, ser-4 and tph-1 conjugated to CalFluor 590, ric-4 
conjugated to Quasar 670, from Biosearch Technologies) was added to the sample. 
The sample was incubated overnight at 37°C. The sample was subsequently 
incubated in wash buffer with DAPI for 30 min at 37 °C protected from light. 
The sample was suspended in 2x SSC and then resuspended in GLOX buffer for 
2 min. The sample was then resuspended in GLOX buffer to which glucose oxidase 
was added. The sample was then mounted and imaged immediately. Images were 
acquired using an automated fluorescence microscope (Zeiss, AXIO Imager Z.2) 
with a 63x objective. Acquisition of Z-stack images (each slice 0.3 j1m thick) was 
performed with ZEN 2 pro software. Representative images are shown following 
max-projection of Z-stacks in Zeiss Zen software. Puncta were quantified by scan- 
ning through the Z-slices sequentially in Zeiss Zen software. Each staining experi- 
ment was performed with two biological and technical replicates. 

For ser-4 smFISH, osm-6::gfp (oyIs59) labels all ciliated sensory neurons and was 
used to identify the phasmid neurons. Identification of DVC and anal depressor 
muscle (mu) was based on a previously published ser-4 transcriptional reporter’. 
The number of smFISH puncta in one PHB neuron was normalized to the number 
of puncta in DVC (in the same animal) to control for staining fluctuations. 

For tph-1 smFISH, NSM and ADF neurons were identified by their stereo- 
typed position, and smFISH puncta colocalizing with the DAPI-stained nuclei 
were quantified. 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 
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Data availability. The data that support the findings of this study are available 
from the corresponding author upon reasonable request. 
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Extended Data Fig. 1 | See next page for caption. 
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Extended Data Fig. 1 | Trans-synaptic labelling by GRASP. a, The 
normally hermaphrodite-specific PHB > AVA and PHA > AVG synaptic 
connections fail to prune in post-dauer adult males, and the normally 
male-specific PHB > AVG and AVG > DA9 connections are pruned 

in post-dauer adult hermaphrodites. Top, red cytoplasmic axon label; 
middle, GRASP (PHB > AVA, PHB > AVG, AVG > DA9) or iBLINC signal 
(PHA > AVG); bottom, magnified inset of colour-inverted synaptic puncta 
with arrowheads to indicate puncta. Intestinal auto-fluorescence is labelled 
‘gut’. Representative images shown; for quantification and replication, 

see Fig. 1b, c and Methods. b, Starvation does not affect expression of 

the cell-specific promoters used for GRASP, and thus the effects on 
synaptic pruning are not an artefact of changes in promoter expression. 
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Representative maximum intensity projection images of control animals 
and animals recovered from 24 h of L1 starvation are shown. mu, muscle 
(srg-13p is variably expressed in some muscle cells in addition to PHA). 

c, Neither L1 heat shock (30 min at 35°C) nor L1 osmotic stress (24 h on 
plates with 200mM NaCl) affects male-specific pruning of the PHB > AVA 
connection. Each dot represents one animal (7 = number of animals, 
shown in each column), blue bars show median, black boxes represent 
quartiles and vertical black lines show range. c, d, P values calculated 

by two-sided Wilcoxon rank-sum test with Bonferroni corrections for 
multiple testing (where applicable; see Methods). d, L1 starvation does not 
affect the normally male-specific PHB > AVG and AVG > DA9 synapses. 
Control animals are the progeny of starved animals. 


© 2018 Springer Nature Limited. All rights reserved. 


LETTER 


a Reversal Index 
0.0 0.2 0.4 0.6 


SDS response 
= >=reversal ¢—= forward 


oO 


0.8 1.0 


0.34 


= hse PHB tax-4 herm. 
me} 

© 

- 

> 

s Ash, PHB _ tax-4 males 
+ ASH 

=) iS 

3 DSK = PHB tax-4 herm. 
© 

N 

> 

s od PHB _ tax-4 males 


Extended Data Fig. 2 | Behaviours unaffected by starvation or 
additional adult development. a, SDS avoidance response is unchanged 
in day 2 adults. Left, predicted synaptic input into avoidance behaviour by 
relevant amphid and phasmid neurons'». Each dot represents the average 
reversal index of one animal over ten experimental trials, median shown 
with vertical magenta bar. P values calculated by two-sided Wilcoxon 
rank-sum test. b, Male mate-searching behaviour is unaffected by L1 
starvation. Each dot represents the average distance one male travelled 


0.08 
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away from a bacterial lawn at four time points over 12 h (n= 48 animals 
control, 20 animals L1 starved), in the absence of hermaphrodites. 
Magenta bars indicate median, black boxes indicate quartiles. P values 
calculated by two-tailed t-test. By contrast, we did find mate-searching 
defects in adult males following recovery from dauer, suggesting that 
these males may have additional changes to the nervous system (data not 
shown). 
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Extended Data Fig. 3 | See next page for caption. 
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Extended Data Fig. 3 | Effects of starvation and exogenous or 
endogenous monoamine signalling on synaptic connectivity. 

a, The normally hermaphrodite-specific PHB > AVA and PHA > AVG 
synaptic connections fail to prune in adult males following L1 
starvation or treatment with exogenous octopamine during L1, but 

can be rescued by exogenous serotonin during L1 starvation. Top, red 
cytoplasmic axon label; middle, GRASP (PHB > AVA) or iBLINC signal 
(PHA > AVG); bottom, magnified inset of colour-inverted synaptic 


puncta with arrowheads to indicate puncta. Scale bars, 101m, all panels. 
Representative images shown; for quantification and replication, see Fig. 


3a and Methods. b, Increases in extrasynaptic 5-HT through fluoxetine 
suppresses the failure to prune synapses. Quantification of PHB > AVA 
synaptic connectivity in adults following exposure to fluoxetine during 


L1 (on food), L1 starvation without fluoxetine, or L1 starvation in the 
presence of exogenous fluoxetine (0.1 mg ml"). Each dot represents 

one animal (n = number of animals, shown in each column), blue bars 
show median, black boxes represent quartiles, vertical black lines show 
range (b, d). P values calculated by two-sided Wilcoxon rank-sum test 
with Bonferroni corrections for multiple testing (where applicable; see 
Methods). ¢, The effect of L1 starvation on male-specific synaptic pruning 
is rescued in the mod-5 mutant background. Representative images shown; 
for quantification and replication, see Fig. 3b and Methods. d, Loss of 
dopamine production (in a mutant for the cat-2 tyrosine hydroxylase) has 
no effect on the pruning of the PHB > AVA and PHA > AVG connections 
in males. 
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Extended Data Fig. 4 | See next page for caption. 
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Extended Data Fig. 4 | Effects of starvation on transcription of tbh-1 
and tph-1. a, Time-course of tbh-1 transcriptional levels in fed (solid 
lines) and L1-starved (dashed lines) animals. tbh-1 levels are even higher 
after 24 h of starvation than after 12 h of starvation, providing a molecular 
correlate for our observation that 12 h of starvation is insufficient to 

affect male-specific synaptic pruning (Fig. 1d). Larval stages (and hours 
post-hatching for fed animals or post-transfer to food for starved animals 
at which imaging took place) shown on x-axis. Centre indicates median, 
error bars indicate quartiles (a, e). P values calculated by two-sided 
Wilcoxon rank-sum test (a—e). 1 =number of animals (shown below 

data points for fed, above for L1-starved) (a, e). b, Expression of a tph-1 
transcriptional fosmid in NSM in fed L1 animals, starved L1 animals or 

L1 animals fed in the presence of 20 mg ml”! exogenous octopamine. Each 
grey dot represents averaged expression level in one animal. Magenta 

bar indicates median, black box represents quartiles (b-d). 1 = number 

of animals (shown in each column) (b-d). c, Expression of a tph-1 


transcriptional fosmid is not affected in ADF or NSM neurons (NSM 
data not shown) by exogenous tyramine in fed L1 hermaphrodites and 
males. TA, tyramine. d, tph-1 transcript levels quantified by smFISH. 
Maximum intensity projection images of one half of animal to show one 
NSM and one ADF neuron. Merge, overlay of tph-1 smFISH puncta onto 
DAPI. Number of tph-1 smFISH puncta was normalized to number of 
ric-4/SNAP-25 synaptic protein smFISH puncta in the same neuron to 
control for staining fluctuations, each dot (n=) one neuron, shown in 
each column. e, Time-course of tph-1 transcriptional levels in fed (solid 
lines) and L1-starved (dashed lines) animals. Larval stages (and hours 
post-hatching for fed animals or post-transfer to food for starved animals 
at which imaging took place) shown on x-axis. Asterisks (fed L3 animals) 
indicate that animals were imaged at different laser settings (60% of all 
other time points) to prevent pixel oversaturation in images: thus, we 
under-estimate the magnitude of the L3 serotonin spike here. 
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Extended Data Fig. 5 | Serotonin is downregulated by starvation and 
functions downstream, but not upstream, of tbh-1 transcription. 

a, tbh-1 is not required for the initial downregulation of tph-1 
transcription in ADF upon LI starvation, but is required for the 
persistence of this downregulation into the L3 stage. Neither initial tph-1 
downregulation nor a starvation memory were apparent in NSM in a tbh- 
1 null mutant. In well-fed conditions, there is no significant difference 
between control and tbh-1 mutant animals. Centre indicates median, 
error bars indicate quartiles. Solid lines indicate continuously fed animals, 
dashed lines indicate L1-starved animals. n = number of animals, shown 


in each column (a-c). P values calculated by two-sided Wilcoxon rank- 
sum test (a-c). b, The ser-6 octopamine receptor is required during sexual 
maturation to maintain tph-1 transcription levels in ADF but not NSM 
under well-fed conditions, and upon starvation tph-1 transcription levels 
in the ser-6 mutant do not further decrease, supporting the necessity of 
ser-6 for proper ADF starvation response. Magenta bar indicates median, 
black box represents quartiles (b, c). c, Upregulation of tbh-1 transcription 
upon L1 starvation is unaffected by the addition of exogenous serotonin 
(5 mM), suggesting that serotonin does not act upstream of tbh-1 
upregulation (and subsequent octopamine production). 
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Extended Data Fig. 6 | See next page for caption. 
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Extended Data Fig. 6 | Regulation and neuronal migration effects 

of serotonin signalling. a, Effects of L1 starvation on male-specific 
synaptic pruning can also be rescued by exogenous serotonin during L3 
(while animals are feeding). Each dot represents one adult male animal 
(n=number of animals, shown in each column), blue bar represents 
median, black box represents quartiles, vertical black bars represent range 
(a, b, e). P values calculated by two-sided Wilcoxon rank-sum test 

(a, b, e). b, tph-1 overexpression in NSM does not rescue starvation effects 
on pruning. Quantification of PHB > AVA and PHA > AVG synaptic 
connectivity in L1-starved adult males overexpressing tph-1 in NSM. Two 
independent transgenic lines were tested for each experiment; 

L1-starved animals without transgenic lines are siblings of transgenic 
animals; controls are non-starved adult males with transgenic arrays. None 
of the transgenic lines resulted in partial or complete rescue. 

c, Overexpression of tph-1 under an ADF-specific promoter during 

L1 starvation rescues the male-specific pruning of the PHB > AVA 

and PHA > AVG synaptic connections. Representative images shown; 

for quantification and replication, see Fig. 3e and Methods. d, tph-1 


LETTER 


(n4622) and ttx-3 (ot22); unc-86 (n846) mutants (in which the NSM 
neuron does not express tph-1 or produce serotonin”’) have cell body 
displacement, dendrite, and axon fasciculation defects in the phasmids. 
Overexpression of tph-1 under an NSM-specific promoter in the tph-1 
(n4622) mutant background rescues the severity and penetrance of these 
defects. Representative images of defects in the PHB neuron are shown 
here as inverted black and white fluorescence images. Asterisk indicates 
dendrite defect, arrow shows anteriorly shifted cell body, arrowhead shows 
fasciculation defect. Scale bars, 10|1m. Per cent of animals with visible 
defects categorized and quantified to the right, n = number of animals, 
shown in each column. P values calculated by Freeman-Halton extension 
of one-sided Fisher exact test. e, Overexpression of tph-1 under an NSM- 
specific promoter in the tph-1 (n4622) mutant background (essentially, 

an ADF-specific tph-1 null) results in male-specific PHB > AVA pruning 
defects. Of two independent NSM::tph-1 transgenic lines, one resulted in a 
slight but insignificant defect in pruning, and one resulted in a substantial 
defect in pruning. 
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Extended Data Fig. 7 | ser-4 expression in PHB is permissive for 
synaptic pruning. a, The normally hermaphrodite-specific PHB > AVA 
and PHA > AVG synaptic connections fail to prune in ser-4 mutant 

males, but pruning can be rescued by cell-specific expression of ser-4 
cDNA in PHB or PHA. Top, red cytoplasmic axon label; middle, GRASP 
(PHB > AVA) or iBLINC signal (PHA > AVG); bottom, magnified inset of 
colour-inverted synaptic puncta with arrowheads to indicate puncta. Scale 
bars, 10j.m, all panels. Representative images shown; for quantification 


PHBP::ser-4; ser-4 (ok512) 


PHA?::ser-4; ser-4 (ok512) 


PHBP::ser-4; ser-4 (ok512) 


and replication, see Fig. 4c and Methods. b, ser-4 smFISH puncta are 
present in PHB in both sexes at L1. Three consecutive individual z-slices 
taken from the maximum intensity projections in Fig. 4b are shown, 
moving laterally through each animal from top to bottom rows. Dotted 
circles outline the PHB nuclei (DAPI) identified using osm-6::gfp (see 
Fig. 4b). Arrowheads in the top row indicate the locations of ser-4 puncta, 
which fade out of focus as the slices progress laterally. For quantification 
and replication, see Fig. 4b and Methods. 
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Extended Data Fig. 8 | ser-4 is necessary for PHB > AVA and 

PHA > AVG synaptic pruning and acts downstream of the ADF 
serotonin signal. a, Starvation does not enhance the male synaptic 
pruning defect in ser-4 mutants, and the ser-4 mutant phenotype cannot 
be rescued by exogenous serotonin. Each dot represents one animal 
(m=number of animals, shown in each column), blue bar represents 
median, black box represents quartiles, vertical black bars represent 


range. P values calculated by two-sided Wilcoxon rank-sum test (a, b). 

b, Expression of a tph-1 transcriptional fosmid in NSM and ADF does 
not significantly differ between wild-type and ser-4 mutant L1 animals. 
Each dot represents the expression level of one animal, n = number of 
animals, shown in each column. Magenta bar indicates median, black box 
represents quartiles. 
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Alpha-kinase 1 is a cytosolic innate immune 
receptor for bacterial ADP-heptose 


Ping Zhou!®, Yang She!?*:8, Na Dong’, Peng Li!, Huabin Hel, Alessio Borio®, Qingcui Wu', Shan Lu', Xiaojun Ding®, Yong Cao!, 
Yue Xu!, Wenging Gao!, Mengqiu Dong’, Jingjin Ding!*, Da-Cheng Wang”, Alla Zamyatina® & Feng Shaob?:”* 


Immune recognition of pathogen-associated molecular patterns 
(PAMPs) by pattern recognition receptors often activates 
proinflammatory NF-«B signalling!. Recent studies indicate 
that the bacterial metabolite p-glycero-8-p-manno-heptose 
1,7-bisphosphate (HBP) can activate NF-«B signalling in host 
cytosol”, but it is unclear whether HBP is a genuine PAMP 
and the cognate pattern recognition receptor has not been 
identified. Here we combined a transposon screen in Yersinia 
pseudotuberculosis with biochemical analyses and identified ADP- 
(-p-manno-heptose (ADP-Hep), which mediates type III secretion 
system-dependent NF-«B activation and cytokine expression. 
ADP-Hep, but not other heptose metabolites, could enter host 
cytosol to activate NF-KB. A CRISPR-Cas9 screen showed that 
activation of NF-«B by ADP-Hep involves an ALPK1 (alpha-kinase 
1)-TIFA (TRAF-interacting protein with forkhead-associated 
domain) axis. ADP-Hep directly binds the N-terminal domain 
of ALPK1, stimulating its kinase domain to phosphorylate and 
activate TIFA. The crystal structure of the N-terminal domain of 
ALPKI and ADP-Hep in complex revealed the atomic mechanism 
of this ligand-receptor recognition process. HBP was transformed 
by host adenylyltransferases into ADP-heptose 7-P, which could 
activate ALPK1 to a lesser extent than ADP-Hep. ADP-Hep (but 
not HBP) alone or during bacterial infection induced Alpk1- 
dependent inflammation in mice. Our findings identify ALPK1 
and ADP-Hep as a pattern recognition receptor and an effective 
immunomodulator, respectively. 

Gram-negative bacteria such as Yersinia®, Salmonella®, Burkholderia’ 
and enteropathogenic Escherichia coli® induce NF-«B-mediated 
cytokine expression in a type III secretion system (T3SS)-dependent 
manner. Consistently, infection of 293T cells with Y. pseudotuber- 
culosis A6 (lacking the six T3SS effectors; A6 is omitted hereafter) 
robustly activated NF-«B-driven luciferase and eGFP reporters® 
(Fig. 1a, b). From 21,000 Y. pseudotuberculosis transposon mutants, 
we identified 37 defective in activating both reporters. Most mutations 
were in T3SS-encoding genes. One mutant that was more impaired 
than the T3SS-deficient AyopB strain had a functional T3SS with its 
transposon inserted in hIdE (Fig. la-c and Extended Data Fig. 1a). 
Expression of HIdE in the transposon or the AhidE mutant restored 
infection-induced NF-kB activation. HIdE, together with GmhA and 
GmhB, synthesizes ADP-p-glycero-3-p-manno-heptose (ADP-DD- 
Hep; 6 is omitted hereafter) from D-sedoheptulose 7-phosphate (S7P) 
through p-glycero-8-p-manno-heptose 7-phosphate (H7P), HBP 
and p-glycero-8-p-manno-heptose 1-phosphate (H1P) (Extended 
Data Fig. 1b). ADP-DD-Hep and ADP-LD-Hep (ADP-1-glycero- 
B-p-manno-heptose) undergo interconversion, catalysed by HldD. 
Deletion of gmhB or hldD did not affect Y. pseudotuberculosis-induced 
activation of NF-KB, whereas deletion of gmhA phenocopied the 
AhldE strain (Fig. 1b). This seems to suggest that HBP, but not H1P 
or ADP-Hep, determines Y. pseudotuberculosis-dependent activation 


of NF-«B, echoing the analyses in Neisseria meningitidis”. However, 
Y. pseudotuberculosis AgmhB, unlike the AhldE and AgmhA strains, 
still supported ADP-Hep-dependent autotransporter heptosylation”” 
(Extended Data Fig. 1c), because of an unknown redundancy to gmhB 
in ADP-Hep biosynthesis!!. When electroporated into 293T cells, 
synthetic HBP and ADP-DD-Hep or ADP-LD-Hep—but not $S7P— 
stimulated NF-«B activation, with ADP-Hep being the most potent 
(Fig. 1d). H1P was even less active than HBP? (Extended Data Fig. 1d). 
When added directly to 293T cells, only ADP-DD-Hep and ADP-LD- 
Hep induced activation of NF-«B and production of interleukin (IL)-8 
(Fig. 1d, e). This explains the use of transfection in recent reports on 
HBP”*. Thus, ADP-Hep is a potent and versatile PAMP. 

Activation of NF-B-eGFP reporter by extracellular ADP-Hep ena- 
bled us to carry out a fluorescence-activated cell sorting (FACS)-based 
genome-wide CRISPR-Cas9 screen (Extended Data Fig. 2a). Following 
a counterscreen against TNF stimulation, we identified ALPK1, TIFA 
and TRAF6 (each hit by more than one guide RNA (gRNA)) that were 
required for ADP-Hep-induced NF-kB-eGFP expression (Fig. 2a and 
Supplementary Table 1). Upon phosphorylation at T9, TIFA forms 
foci to activate TRAF6-dependent NF-KB signalling!*-'*. During the 
preparation of this manuscript, ALPK1 was identified as contributing 
to activation of NF-KB in Shigella flexneri and Helicobacter pylori**. 
Deletion of ALPK1 or TIFA (Supplementary Table 2) abolished 
ADP-LD-Hep-induced activation of NF-«B and expression of IL-8 
(Fig. 2b, c and Extended Data Fig. 2b, c). Defective NF-«B activation 
in ALPK1~‘~ cells was restored by wild-type ALPK1 but not by its 
kinase-inactive K1067M mutant (Fig. 2b); TIFA~/~ was rescued by 
wild-type TIFA but not by a T9A mutant (Extended Data Fig. 2c). 
ADP-LD-Hep induced phosphorylation of TIFA at T9 and formation 
of eGFP-TIFA foci dependent upon ALPK1 kinase activity (Fig. 2d 
and Extended Data Fig. 2d). ADP-LD-Hep did not affect the cytoplas- 
mic localization of ALPK]1 and induced no myosin phosphorylation’ 
(Extended Data Fig. 2e, f). ALPK1 and TIFA were also required for 
the activation of NF-«B by electroporation of ADP-Hep (Extended 
Data Fig. 2g). 

ADP-LD-Hep triggered co-immunoprecipitation of TIFA with 
ALPK1 and TRAF6 (Extended Data Fig. 2h). Deletion of ALPK1 did 
not affect activation of NF-KB by TNF, NOD1 or NOD2 (mediates 
Salmonella-induced NF-kB activation!®!”) or MYD88 overexpression 
(Extended Data Fig. le). Cells lacking NOD1 and NOD2 showed intact 
NE-&B responses and TIFA foci following treatment with ADP-LD-Hep 
(Extended Data Fig. 1f, g). Thus, ADP-Hep activates NF-KB specifically 
through the ALPK1-TIFA-TRAF6 axis. 

Induction of IL-8 expression by Y. pseudotuberculosis, which 
required hidE and yopB, was blocked by deletion of ALPK1 (Fig. 2e). 
Infection-induced activation of NF-kB and formation of eGFP-TIFA 
foci required ALPK1-dependent phosphorylation of TIFA at T9 
(Extended Data Fig. 3a—e). Other bacteria, such as diffuse-adhering 
E. coli (DAEC), enterotoxigenic E. coli (ETEC) and Burkholderia 
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Fig. 1 | Transposon screen of Yersinia T3SS-dependent NF-KB 
activation identifies ADP-Hep as a PAMP. a, b, 293T cells were infected 
with indicated Yersinia strains. pHIdE is a plasmid expressing HIdE. 

c, Transposon insertion in hidE. d, e, 293T (d) or PMA-differentiated 
THP-1 cells (e) were electroporated (d) or extracellularly treated (d, e) 
with indicated sugars. IL8 mRNA (also known as CXCL8, shown relative 
to 18S rRNA) and IL-8 secretion were measured by quantitative real- 
time PCR (qPCR) and enzyme-linked immunosorbent assay (ELISA), 
respectively. NF-«B activation was measured by luciferase activity (a, d) 
or eGFP reporter expression (b) (scale bar, 50 pm). Data shown as 

mean +s.d. from three technical replicates (a, d, e), and representative of 
three (a, b, d) and two (e) independent experiments. a, d, e, Two-tailed 
unpaired Student's t-test (**P < 0.01, ***P < 0.001). 


cenocepacia also triggered NF-«B activation mediated by the ALPK1- 
TIFA axis in an hldE-dependent manner (Extended Data Fig. 3f, g). 
Thus, sensing of ADP-Hep by ALPK1 is not limited to T3SS-containing 
bacteria. 

ALPKI1 contains an a-helical domain and a kinase domain linked 
by an extensive unstructured region (Fig. 2f). ALPK1-N492 (res- 
idues 1-492) and ALPK1-AN492 (lacking these residues) were 
co-immunoprecipitated, independently of ADP-LD-Hep (Fig. 2g). 
Co-expression of ALPK1-N492 and ALPK1-AN492, or minimally the 
N-terminal and kinase domains of ALPK1 (ALPK1-NTD (1-473) 
and ALPK1-KD (959-1244), respectively), was sufficient to allow 
ADP-LD-Hep or Y. pseudotuberculosis to induce activation of NF-KB 
and phosphorylation of TIFA (Fig. 2h and Extended Data Fig. 4a-e). 
Unexpectedly, a purified complex of ALPK1-NTD and ALPK1-KD 
(ALPK1-(N+K); Extended Data Fig. 5a) could directly phosphoryl- 
ate T9 of TIFA (Fig. 3a). By contrast, activation of ALPK1-(N+K) 
in mammalian cells required infection or stimulation by ADP-Hep. 
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Fig. 2 | CRISPR-Cas9 screens identify an ALPK1-TIFA axis that 
mediates activation of NF-KB induced by ADP-Hep or Yersinia. a, Top 
eight gRNA hits from CRISPR-Cas9 screen of ADP-LD-Hep-induced 
NF-kB activation. b-e, h, Wild-type or ALPK1 —'— 293T cells expressing 
the indicated ALPK1 mutants were treated with ADP-LD-Hep (b-d, h) 

or HBP (c), or infected with Y. pseudotuberculosis (e, h). d, Anti-Flag and 
anti-pT9-TIFA immunoblots of 293T cells. f, Domain organization of 
ALPK1. g, Co-immunoprecipitation of ALPK1 N- and C-terminal regions 
from 293T cells treated with or without ADP-LD-Hep. NF-kB activation 
and IL8 mRNA were assessed by luciferase reporter activity (b, h) or qRT- 
PCR (c, e), respectively. KO-1/2, two ALPK1 ~!~ clones. Data shown as 
mean +s.d. from three technical replicates (b, c, e, h) and representative of 
three (a, b, d, h, g) or two (c, e) independent experiments. b, c, e, h, Two- 
tailed unpaired Student's t-test (**P < 0.01, ***P < 0.001). 
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Notably, ALPK1-(N+K) from E. coli AhIdE did not phosphoryl- 
ate TIFA (Fig. 3a). Small-molecule extracts from wild-type but not 
AhldE mutant E. coli-derived ALPK1-NTD, when added to 293T cells, 
potently stimulated ALPK1-dependent phosphorylation of TIFA and 
activation of NF-KB, the latter of which correlated with the amount of 
ALPK1-NTD (Fig. 3b, c). 

High-performance liquid chromatography (HPLC) of the small- 
molecule extracts identified one active fraction (no. 6, the only one 
with high ultraviolet absorption) (Extended Data Fig. 5b-d). Mass 
spectrometry of fraction 6 uncovered three dominant ions with mass- 
to-charge ratios (m/z) of 619.8, 347.9 and 427.8, matching those of 
ADP-Hep, AMP and ADP, respectively (Extended Data Fig. 5e). The 
presumed ADP-Hep ion showed a similar retention time and fragmen- 
tation pattern to synthetic ADP-Hep (Extended Data Fig. 5f, g). The 
mass of native ALPK1-NTD exceeded that of denatured ALPK1-NTD 
by 619.32 Da (one ADP-Hep) (Fig. 3d). Direct binding of E. coli AhldE- 
derived apo-ALPK1-(N+K) to ADP-LD-Hep (but not S7P) was readily 
detected (Extended Data Fig. 5h). 

We determined the 2.59 A crystal structure of ALPK1-NTD puri- 
fied from wild-type E. coli (Extended Data Table 1a). Each asymmetric 
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Fig. 3 | ADP-Hep binding to ALPK1-NTD and crystal structure of 

the binding complex. a, ALPK1-(N+K) purified from indicated E. coli 
strains was incubated with histidine-tagged TIFA (TIFA—His,) in the 
presence or absence of ADP-LD-Hep. b, Gel-filtration chromatography of 
MBP-ALPKI-NTD purified from wild-type E. coli. b, c, 293T cells were 
treated with small-molecule extracts from fractions 1-5 (b) or from TIFA 
or MBP-ALPK1-NTD (c). Activation of NF-«B was assessed by luciferase 
reporter assay (mean + s.d. from three technical replicates); two-tailed 
unpaired Student's t-test, ***P < 0.001. d, Electrospray ionization mass 


unit contains nine molecules (A-I). Molecule A has the highest-quality 
density map and was used to build the final model, which lacks only the 
first methionine. The structure bears 18 helices (a1 to a18), forming 
seven antiparallel pairs (al-a4, 05-06, a7-08, a9-al0, all-al3, 
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Fig. 4 | Binding of ADP-Hep but not HBP activates ALPK1 in vitro. 

a, b, d, e, Apo-ALPK1-(N+K) (a, d, e) or 293T cell-purified Flag-ALPK1 
(wild-type or indicated mutants; K/M denotes kinase-inactive K1067M 
mutant) (b) were incubated with TIFA—Hisg in the presence of different 
sugars. A titrating amount of HBP was added (d); HBP was pre-treated 
with NMNATI1 or NMNAT3 (e). Proteins were purified from E. coli AhIdE; 
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spectrometry of native and denatured ALPK1-NTD purified from 
wild-type E. coli. e, Structure of the ALPK1-NTD-ADP-Hep complex. 
f, Stereo diagram of the simulated annealing F,— F. omit map of 
ADP-LD-Hep contoured at 30. g, Surface representation of ADP-Hep- 
bound ALPK1-NTD and overview of the binding pocket. Black dashed 
lines, hydrogen bonds. a, c, Anti-pT9-TIFA immunoblotting of TIFA 
phosphorylation. Data are representative of three (a—c) or two (d) 
independent experiments. 


a14-a15, and a16-a17) (Fig. 3e). The «1-a4 pair features an inser- 
tion containing «2, «3 and loop L1. «18, which precedes the L2 tail, 
flanks the outer surface of the «16-a17 pair. The one-by-one-packed 
seven helix pairs form a right-hand solenoid (Fig. 3e), resembling a 
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TIFA phosphorylation was assessed by anti-pT9-TIFA immunoblotting. 
DD, ADP-DD-Hep; LD, ADP-LD-Hep. c, Wild-type or ALPK1~/~ 293T 
cells expressing the indicated ALPK1 mutants were stimulated with ADP- 
LD-Hep. Luciferase assay of NF-kB activation is shown as mean +s.d. 

from three technical replicates (two-tailed unpaired Student's t-test, 

***P <().001). All data are representative of three independent experiments. 
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Fig. 5 | ADP-Hep and B. cenocepacia infection induce ALPK1- 
dependent inflammatory responses in mice. a~c, HBP or ADP-LD-Hep 
(2 mg kg“) was injected into the dorsal air pouches of wild-type (a-c) or 
Alpk1 ~~ mice (c). d, e, Mice were intratracheally infected for 24 h with 
B. cenocepacia (Bc) J2315 (5 x 10’). a, Neutrophil counts in the air 
pouch. b-d, Cytokine concentrations in air pouch washes (b, c) or lung 


tetratricopeptide repeat (TPR) domain structure. The top hits from a 
Dali search were all TPR domains, despite lacking sequence homology 
to ALPK1-NTD (Extended Data Table 1b). 

The concave side of the ALPK1-NTD solenoid has a narrow pocket 
(Fig. 3e). The pocket has high-quality electron densities specifically 
for ADP-LD-Hep (although it is spacious enough for ADP-DD-Hep; 
Fig. 3f). The two ADP phosphates are clamped by R116, R150, R153 
and K233 through a hydrogen-bond network: two bonds between 
R116 (or R153) and 8-phosphate, two between R150 and both phos- 
phates, and one between K233 and a-phosphate (Fig. 3g). The ADP 
adenosine has a stacking interaction with F295 and two hydrogen 
bonds with $236 and T237. The heptose C3 and C4 hydroxyls are 
anchored by Q67 and D231, C2 and Cé are contacted by K233, and 
C7 is fixed by R153. The heptose backbone has a stacking interaction 
with F61. These ADP-Hep-binding residues are conserved in ALPK1 
of other vertebrates (Extended Data Fig. 4f). 

Apo-ALPK1-(N+K) could not phosphorylate TIFA. It was acti- 
vated upon incubation with ADP-LD or DD-Hep, but not S7P, HBP 
or H1P (Figs. 3a, 4a). The inactivity of S7P is consistent with its lack 
of binding to ALPK1-NTD (Extended Data Fig. 5h) and failure to 
induce NF-kB activation (Fig. 1d). Full-length ALPK1 purified from 
293T cells also responded to ADP-LD-Hep but not to HBP (Fig. 4b). 
The R116A, R150A, R153A or K233A mutant forms of ALPK1, 
which are expected to have impaired binding to the phosphates of 
ADP-Hep, resisted activation by ADP-Hep (Fig. 4b). Similar results 


homogenates (d) determined by multiplex immunoassay (b, d) or ELISA 
(c). e, Numbers of bacteria in lungs. Data shown as mean + s.e.m. (two- 
tailed unpaired Student's t-test, *P < 0.05, **P< 0.01, ***P< 0.001; NS, 
not significant). n (biologically independent animals) = 5 in a-c, 9 for 
saline/wild-type, 7 for saline/Alpk1~/~ and 8 for B. cenocepacia in d, e. All 
data are representative of two independent experiments. 


were obtained with Q67A/D231A, T237E/F295K and F61D muta- 
tions, which affect heptose or adenosine binding. These ALPK1 
mutants could not mediate activation of NF-«B by ADP-LD-Hep 
(Fig. 4c). The requirement of AMP-contacting residues for activation 
of ALPK1 explains why HBP is inactive in vitro. Thus, ALPK1 is a 
functional receptor specifically for ADP-Hep. 

HBP could bind apo-ALPK1-(N+K) but with a lower affinity 
than ADP-Hep (Extended Data Fig. 6a); a 100-fold excess of HBP 
was needed for it to compete with ADP-LD-Hep to activate ALPK1 
(Fig. 4d and Extended Data Fig. 6b). ALPK1-(N+K) incubated with 
ADP-LD-Hep could bind and phosphorylate a GST-fused N-terminal 
15-residue peptide of TIFA and catalyse ATP hydrolysis (Extended 
Data Fig. 6c-e). By contrast, ALPK1-(N+K) incubated with HBP 
showed neither substrate recognition nor catalysis of ATP hydrolysis. 
Chemical cross-linking coupled with mass spectrometry identified 
ten peptide pairs between the NTD and KD molecules of apo- 
ALPK1-(N-+K) (Extended Data Fig. 6f and Supplementary Table 3). 
ALPK1-(N+K) incubated with HBP showed a similar pattern, but 
ALPK1-(N+K) incubated with ADP-LD-Hep had only six crosslinks. 
The four connections to K1149 of ALPK1-KD and one to the nearby 
K1140, which were partially lost in HBP-incubated ALPK1-(N+K), 
all disappeared in ADP-LD-Hep-incubated ALPK1-(N+K). The two 
lysines are predicted to be near the kinase catalytic cleft!’ (Extended 
Data Fig. 6g), which participates in substrate binding. Thus, bind- 
ing of ADP-LD-Hep to ALPK1-NTD renders the catalytic cleft more 
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exposed, probably by inducing a larger conformational change than 
the nonproductive HBP binding. 

Activation of NF-«B induced by HBP transfection required 
phosphorylation of TIFA by ALPK1** (Extended Data Fig. 7a, b). 
We considered how HBP activates ALPK1 in cells. After noticing a 
side reaction of HIdE adenylyltransferase with HBP, we found that 
host-derived adenylyltransferases—such as nicotinamide nucleotide 
adenylyltransferases—could convert HBP into ADP-heptose 7-P, 
which was competent (albeit less than ADP-Hep) to activate ALPK1 
and the downstream NF-kB response (Fig. 4e and Extended Data 
Fig. 7c-k; see Supplementary Text). We also found that ALPK1(Q67A), 
ALPK1(Y68A) or the double mutant lost the response to HBP electro- 
poration but remained competent to mediate ADP-LD-Hep-induced 
activation of NF-KB (Extended Data Fig. 8a, b). Cells expressing 
these mutants also showed defective responses to ADP-heptose 7-P 
(Extended Data Fig. 8c). Purified ALPK1(Q67A/Y68A) was acti- 
vated by ADP-LD-Hep but not ADP-heptose 7-P or HBP (Extended 
Data Fig. 8d). Small-molecule extracts from cells electroporated with 
HBP could activate wild-type ALPK1 but not ALPK1(Q67A/Y68A) 
(Extended Data Fig. 8e, f). Thus, cytosolic HBP might be metabo- 
lized into ADP-heptose 7-P to activate ALPK1. Moreover, 293T cells 
expressing ALPK1(Q67A/Y68A) showed a normal NF-KB response to 
Y, pseudotuberculosis (Extended Data Fig. 8g), confirming the detection 
of ADP-Hep by the host during infection. 

Injection of ADP-LD-Hep, but not of HBP, into the dorsal air pouches 
of mice induced massive neutrophil recruitment (Fig. 5a). Several 
NF-«B-controlled cytokines and chemokines, including IL-6, TNE 
IP-10, MCP-1, MCP-3, IFNy, GM-CSE, MIP-1a, MIP-13 and RANTES, 
were highly elevated in the air pouches (Fig. 5b and Extended Data 
Fig. 9a). ADP-LD-Hep also increased the serum levels of GRO-«, IP-10 
and MCP-1 (Extended Data Fig. 9b, c). HBP injection affected neither 
local nor systemic productions of these inflammatory mediators (Fig. 5b 
and Extended Data Fig. 9a-c). Alpk1~/~ mice injected with ADP-LD- 
Hep showed no increase in cytokine production (Fig. 5c and Extended 
Data Fig. 9d). Mice were also infected with B. cenocepacia, which is 
known to trigger lung inflammation. Consistent with the cell culture 
data (Extended Data Fig. 3f, g), infection with B. cenocepacia increased 
the expression of MCP-3, GM-CSE, MIP-1aand 8, and RANTES in 
the lungs of wild-type mice, and these responses were compromised in 
Alpk1~'~ mice (Fig. 5d). Alpk1~'~ mice showed a higher bacterial load 
in the lungs than wild-type mice (Fig. 5e). These data emphasize the 
functional relevance of recognition of ADP-Hep by ALPK1. 

We have shown that ADP-Hep is permeable to mammalian cells and 
can be exploited as an immunomodulator or a vaccine adjuvant. The 
pattern recognition receptor function of ALPK1 highlights the versa- 
tile mechanisms underlying cytosolic detection of bacteria’? and will 
stimulate research into other alpha kinases!*. ADP-Hep is present in 
all Gram-negative and some Gram-positive bacteria”°; ALPK1 is also 
widely expressed. Recognition of ADP-Hep by ALPK1, as with recog- 
nition of LPS by TLR4 and caspase-11, represents a generic form of 
innate sensing, which mediates immune responses to diverse bacterial 
pathogens. These include N. meningitidis’, DAEC and ETEC, which are 
extracellular and possess no injection systems, as well as Yersinia spp., 


B. cenocepacia, S. flexneri’ and H. pylori**)”. 
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METHODS 


Plasmids, antibodies and reagents. DNA for hidE was amplified from Y. pseu- 
dotuberculosis 1P2666 and inserted into the pBAD24 vector for rescue expression 
in Yersinia. DNAs for gmhA, hldE and gmhB were amplified from E. coli DH5a 
and cloned into pET28a-6 x His-SUMO or pQE-80L vectors for recombinant 
expression in E. coli. cDNA for human ALPK1 was amplified from 293T cDNA 
reverse transcripts. CDNAs for TIFA and MYD88 were amplified from a HeLa 
cDNA library. cDNA for myosin I was from an ORF library from Invitrogen (clone 
ID: IOH29181). For expression in mammalian cells, cDNAs of indicated genes 
were cloned into the pCS2-3 x Flag or pCS2-6 x Myc vectors for transient transfec- 
tion, and for stable expression the cDNAs were inserted into the FUIPW lentiviral 
vector with an N-terminal eGFP or mCherry tag. For recombinant expression 
in bacteria, indicated ALPK1 cDNA fragments were inserted into the pMAL- 
c2X, pQE-80L, pGEX-6p-2, pET28a-6 x His-SUMO or pACSUMO (the origin 
of pET28a-6 x His-SUMO was replaced with p15A derived from pACYC vector) 
vectors, and TIFA cDNA was cloned into pET-22b vector with a C-terminal 6 x His 
tag. cDNAs for human NMNAT] (isoform 1) and NMNAT3 (isoform 4) were syn- 
thesized by our in-house gene synthesis facility and cloned into the pGEX-6p-2 
vector for recombinant expression in bacteria. The NOD1 and NOD2 expression 
plasmids were kindly provided by G. Nunez (University of Michigan) and the 
TRAF6 expression plasmid was previously described”’. Flag-tagged AIDA-I frag- 
ment (GST-AIDA-I5-600-Flag) and AAH expression plasmids have also previously 
been described’. The pNF-kB-eGFP reporter plasmid (pNL2.2-BII-5RE-eGFP) 
used in this study was generated by inserting five copies of kB sites into the multiple 
cloning sites in pNL2.2 (Promega) and at the same time replacing the Nluc cDNA 
with that of eGFP. All luciferase assay plasmids were described in our previous 
publications** *°. The lentiCas9-Blast and lentiGuide-Puro plasmids used for gen- 
erating the knockout cells were obtained from Addgene. All truncations and point 
mutations were generated by the standard PCR cloning strategy. All plasmids were 
verified by DNA sequencing. 

The anti-AAH antibody has previously been described’. The rabbit polyclonal 
antibody for ALPK1 was from GeneTex (#GTX87015). Antibodies for tubulin 
(T5168) and Flag (M2) were from Sigma-Aldrich. The anti-Myc monoclonal anti- 
body (9E10) was from Covance. The rabbit polyclonal antibody (anti-pT9-TIFA) 
against T9-phosphorylated human TIFA was developed by Abcam as a collabora- 
tive project using a synthetic phospho-peptide antigen, from which a monoclonal 
antibody was generated (ab214815). 

HBP, ADP-DD-Hep and ADP-LD-Hep have previously been synthesized?”*. 

p-Sedoheptulose 7-phosphate (S7P) (#78832) was purchased from Sigma- 
Aldrich. Recombinant human TNF (#rcyc-htnfa), C12-iE-DAP (#tlrl-cl2dap), 
and muramy] dipeptide (MDP) (#tlrl-mdp) were InvivoGen products. ELISA kits 
for human IL-8, mouse IL-6 and MCP1 were purchased from Dakewe Biotech; 
ELISA kits for mouse GRO-« and mouse IP-10 were from R&D Systems and 
Neobioscience, respectively. All other chemical reagents used were from Sigma- 
Aldrich unless noted. 
Bacterial strains and infection. Y. pseudotuberculosis 1P2666 A6 lacking six T3SS 
effector proteins (YopH, YopE, YopM, YopO, Yop] and YopT) was provided by 
R. R. Isberg (Tufts University School of Medicine). S. flexneri 2a strain 2457T, 
DAEC 2787, ETEC H10407 and B. cenocepacia J2315 were as described???*°, 
E. coli BL21 (DE3) strain was used for recombinant protein expression. Bacterial 
deletion mutants were generated by using the \ Red recombineering technology 
as previously described*". For complementation of Y. pseudotuberculosis 1P2666 
A6/AhldE, 0.2% L-arabinose was used to induce HIdE expression. To determine 
the biosynthesis of ADP-Hep in Y. pseudotuberculosis mutants, plasmids expressing 
GST-AIDA-Is50_600-Flag and AAH were transformed into the bacteria, and 
heptosylation of AIDA-I was detected with the ECL glycoprotein detection kit (GE 
Healthcare) as previously described’. Low calcium-induced type III secretion assay 
was performed by following an established protocol”. 

For cell culture infection, bacteria were cultured at 30°C (for Y. pseudotubercu- 
losis) or 37°C (for S. flexneri) in Luria-Bertani (LB) broth with shaking until OD¢oo 
reached 1.5. 293T cells were seeded in 96- or 24-well plates and cultured for 16 h 
before the infection (MOI 50). The infection was facilitated by centrifugation at 
800g for 5 min at room temperature. One and a half hours after the infection, the 
cell culture medium was placed with refresh Dulbecco's modified Eagle’s medium 
(DMEM) containing 34 jg ml"! chloramphenicol to prevent over-proliferation of 
both extra- and intracellular bacteria. Infected cells were further cultured for 12h 
or 4h before being subjected to NF-«B luciferase/eGFP reporter assays or indicated 
immunoblotting assays, respectively. 

Bacterial transposon screen. Ampicillin-resistant Y. pseudotuberculosis P2666 A6 
was obtained by transformation with the pBAD24-mCherry plasmid. The Himar1 
mariner transposon vector pSC123*? (kindly provided by A. Rietsch, Case Western 
Reserve University and S. Lory, Harvard Medical School) was used to generate the 
mutant library of Y. pseudotuberculosis IP2666 A6. In brief, pBAD24-mCherry- 
transformed Y. pseudotuberculosis 1P2666 A6 was mated with E. coli DH5a 
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(\ Pir) containing pSC123 in the presence of the MT607 helper strain. The mating 
mixtures were plated onto the LB agar containing ampicillin and kanamycin for 
counter selection. Single transconjugants were randomly picked into 96-well plates 
containing 80 11 of LB broth for culture and long-term storage. Approximately 
21,000 mutant clones were collected to generate the transposon library. For the 
screen, 293T cells stably transfected with pNF-kB-eGFP reporter plasmid were 
seeded into 96-well plates and cultured overnight to reach nearly 90% confluency. 
The transposon mutant culture (OD¢09 1.5-2) was added onto the cells (3 jl per 
well) and the infection was facilitated by centrifugation at 800g for 5 min. One and 
a half hours after infection, the culture medium was replaced with fresh DMEM 
containing 34 jug ml“! chloramphenicol. Ten hours later, the infected cells were 
fixed in 4% paraformaldehyde for 30 min and then incubated in PBS (60 iil per 
well) for eGFP fluorescence measurement using a microplate reader. The candidate 
mutant strains obtained from the eGFP reporter screen were further screened using 
the NF-«B luciferase reporter assay. 

Cell culture, luciferase reporter assay and microscopy. 293T and THP-1 cells 
were obtained from the American Type Culture Collection (ATCC). 293T cells 
were grown in DMEM supplemented with 10% (v/v) fetal bovine serum (FBS) and 
2mM t-glutamine. THP-1 cells were grown in RPMI 1640 medium containing 
10% FBS and 2 mM t-glutamine. Knockout 293T cell lines were generated by using 
the CRISPR-Cas9 method as recently described*# and sequences for gRNAs target- 
ing ALPK1 and TIFA were listed in Supplementary Table 2. All cells were tested for 
mycoplasma using the standard PCR method. Cell identity was checked frequently 
by their morphological features but was not authenticated by short tandem repeat 
(STR) profiling. For luciferase assay, the plasmids were transfected into 293T cells 
with the Jetprime reagents (Polyplus). Luciferase activity was determined using the 
dual luciferase assay kit (Promega) according to the manufacturer's instructions. 
For fluorescence microscopy imaging, 293T cells expressing the pNF-«.B-eGFP 
reporter were seeded onto glass coverslips in 24-well plates and cultured for 16h 
before transfection or infection (MOI 50). The cells were fixed and stained with 
Hoechst 33342. Fluorescence images were acquired on the Nikon A1-R or Zeiss 
Meta confocal microscope. 

Immunoprecipitation and qRT-PCR. For immunoprecipitation, 293T cells at 
a confluency of 70-80% in 6-well plates were transfected with a total of 2.5 jig of 
indicated plasmids. Twenty-four hours after transfection, cells were treated with 
or without ADP-LD-Hep (100 |M) for 4h. The cells were washed once in PBS and 
lysed in buffer containing 50 mM Tris-HCl (pH 7.5), 150 mM NaCl, 2 mM EDTA 
and 1% Triton X-100, supplemented with a protease inhibitor cocktail (Roche 
Molecular Biochemicals). Pre-cleared cell lysates were then subjected to anti-Flag 
M2 immunoprecipitation following the manufacturer’s instructions. The beads 
were washed four times with the lysis buffer and the immunoprecipitates were sub- 
jected to standard immunoblotting analysis. RT-PCR was performed as previously 
described**. mRNA levels of the target gene were normalized to that of 18S rRNA. 
The primers used for human IL8 are 5’-AATCTGGCAACCCTAGTCTGCTA-3’ 
(forward) and 5/-AAACCAAGGCACAGTGGAACA-3’ (reverse), and those for 
human 18S rRNA are 5‘-GACTCATTGGCCCTGTAATTGGAATGAGTC-3’ (for- 
ward) and 5’-CCAAGATCCAACTACGAGCTT-3’ (reverse), both of which are 
the same as previously described®. 

FACS-based genome-wide CRISPR-Cas9 screen. Human CRISPR knock- 
out pooled gRNA plasmid library (GeCKO v2) encompassing 123,411 different 
gRNAs targeting 19,050 human genes was generated by the Zhang laboratory” 
and obtained from Addgene. Amplification of the library and preparation of the 
lentivirus were performed as recently described*‘. To perform the screen, 293T 
cells stably expressing Cas9 and pNF-«B-eGFP reporter were seeded in the 
15-cm dish (7.5 x 10° cells per dish) and a total of 1.5 x 10° cells were infected 
with the gRNA lentivirus library at an MOI of 0.3. Forty-eight hours after infec- 
tion, cells were re-seeded and selected with 1 j.g ml"! puromycin. After 6 days, 
2 x 108 puromycin-resistant cells were left untreated to obtain the control sam- 
ple. About 2.5 x 10° puromycin-resistant cells were treated overnight with 10 1M 
ADP-LD-Hep and sorted for eGFP-negative cells (~15%) on a BD Biosciences 
FACSAria II Flow Cytometer. After culturing for about one week, the sorted cells 
were re-treated with ADP-LD-Hep until the percentage of eGFP-negative cells 
reached more than 95%. The resulting ADP-LD-Hep unresponsive cells were cul- 
tured and treated overnight with 20 ng ml"! TNF followed by FACS sorting of the 
eGFP-positive cells. After expansion, the TNF-responsive cells, together with the 
control cells, were subjected to DNA extraction*™. Two parallel screens were per- 
formed. Amplification of the gRNA sequences was performed using a two-step 
PCR method similar to that described recently**. In the first step, twenty-four 
50-11 PCR reactions (each containing 10 jug of genomic DNA) were performed 
with the forward primer 50bp-F (5’/-CTCTTTCCCTACACGACGCTCTTCCG 
ATCTCTTGTGGAAAGGACGAAACA-3’) and the reverse primer 50bp-R2 
(5'-GTGACTGGAGT TCAGACGTGTGCTCTTCCGATCTTCTCAAGATCTAGT 
TACGCC-3’); the PCR program used is 95°C for 3 min, 18 cycles of 95°C 
for 30 s, 56°C for 20 s and 68°C for 20 s, and a final 3-min extension at 68 °C. 
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Products of the first-step PCR were pooled together and used as the template 
for the second-step PCR. Four 50-1 PCR reactions (each containing 2.5 il of 
the first-step PCR product) were performed with the forward primer Index-F 
(5'/-AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACG-3’) and 
one of the reverse primers: Index-R1 (5’-CAAGCAGAAGACGGCATAC 
GAGATCGTGATGTGACTGGAGTTC-3’) for the control sample, Index-R4 (5’- 
CAAGCAGAAGACGGCATACGAGATTGGTCAGTGACTGGAGTTC-3’) and 
Index-R8 (5'’-CAAGCAGAAGACGGCATACGAGATTCAAGTGT 
GACTGGAGTTC-3’) for the screen samples. The PCR program used is 95°C for 
3 min, 21 cycles of 95°C for 30 s, 56°C for 20 s and 68°C for 20 s, and a final 3-min 
extension at 68°C. The products of the second-step PCR reactions were subjected 
to electrophoresis on 2% agarose gel; the 274-bp DNA bands were extracted and 
sequenced at the HiSeq2500 platform (Illumina). The first 19 nucleotides from 
each sequencing read are the gRNA sequence recovered. The frequency of each 
gRNA was obtained by dividing the gRNA read number by the total sample read 
number; the fold change was calculated by comparing the frequency of each gRNA 
in the screen sample with that in the control sample. The fold change ranking was 
obtained based on the smaller fold change value of each individual gRNA in the 
two parallel screens. 
Recombinant protein purification. Protein expression was induced in E. coli BL21 
(DE3) strain (wild-type or AhIdE) at 20°C for 16 h with 0.5 mM isopropyl-3-p- 
thiogalactopyranoside (IPTG) after ODgoo of the bacterial culture reached 
0.8. Affinity purification of MBP-ALPK1-NTD and MBP-TIFA was performed 
using amylose resin. His-tagged SUMO-GmhA, SUMO-HIdE, GmhB, ALPK1- 
NTD, and TIFA proteins were purified using Ni-NTA agarose resin (Qiagen). The 
proteins were further purified by HiTrap Q HP ion-exchange chromatography and 
gel filtration chromatography (GE Healthcare Life Sciences). Five per cent glycerol 
was used throughout the purification process for TIFA protein. To obtain the ALPK1- 
(N+K) complex, pGEX-6p-2 vector containing human ALPK1 (1-473) and pACSU- 
MO-ALPK1 (959-1244) were transformed into E. coli BL21 (DE3). Cells were grown 
at 37°C in LB medium containing 100 jg ml"! ampicillin and 30 pg ml! kanamycin; 
when ODgop reached 0.8, 0.5 mM IPTG was added to induce protein expression at 
20°C for 16 h. Cells were collected by centrifugation at 4,700g, resuspended in lysis 
buffer containing 20 mM Tris-HCl (pH 8.0) and 300 mM NaCl, and then lysed 
with an ultrasonic cell disruptor. GST-ALPK1-(N+K) complexes were purified 
by glutathione sepharose affinity chromatography. GST was removed by overnight 
digestion with the homemade HRV 3C protease at 4°C. The ALPK1-(N+K) com- 
plex was concentrated to 0.1 mg ml"! for subsequent biochemical assays. Human 
NMNATI1 and NMNAT3 were expressed in E. coli BL21 (DE3) AhidE strain, and 
both proteins were purified in the same way as for ALPK1-(N+K), with an additional 
step of Superdex G200 gel filtration chromatography (GE Healthcare Life Sciences). 
To obtain proteins for structure determination, pET28-Hiss-SUMO-ALPK1 
(1-451 or 1-446) was transformed into E. coli BL21 (DE3). Proteins were expressed 
and purified similarly as for ALPK1-(N+K) complexes except that a Ni-Sepharose 
column was used for affinity purification. SUMO was removed by overnight diges- 
tion with homemade ULP!1 protease at 4°C. The untagged protein was further 
purified by HiTrap Q anion-exchange and Superdex G75 gel filtration chroma- 
tography. Selenomethionine-substituted (SeMet) ALPK1 (1-451) was expressed 
in the methionine auxotrophic E. coli strain B834 (DE3) and purified by the same 
procedure as for the native protein. 
Bio-layer interferometry. The bio-layer interferometry (BLI) assay was performed 
on Octet RED96 System (ForteBio) at 29°C. The running buffer was 20 mM 
HEPES (pH 7.5), 150 mM NaCl and 0.02% Tween-20. Ligand-free ALPK1- 
(N+K) complexes purified from E. coli BL21 (DE3) AhIdE were biotinylated 
using EZ-Link Sulfo-NHS-LC-Biotinylation Kit (Thermo Fisher Scientific) for 
30 min and then immobilized onto a Super Streptavidin (SSA) biosensor that 
had been equilibrated in the running buffer for 10 min. The SSA biosensor was 
transferred into buffer containing ADP-LD-Hep, HBP or S7P at the indicated 
concentration (for association) or the analyte-free buffer (for dissociation). The 
data were analysed and the binding constant was determined using software 
provided by Fortebio (Data Analysis 8.5). 
Enzymatic synthesis of H1P and ADP-heptose 7-P. For enzymatic synthesis of 
H1P, 1 mM HBP and 10 1M GmbB protein purified from E. coli BL21 (DE3) 
AhldE were reacted in 1 ml of buffer containing 20 mM Tris-HCl (pH 8.0) and 
10 mM MgCl for 2 h at 37°C. The reaction product H1P (anion of m/z 289) was 
purified by HPLC-MS. For enzymatic synthesis of ADP-heptose 7-P, GmhA and 
HIdE proteins were purified from E. coli BL21 (DE3) AhldE, and the synthesis was 
performed with 1 mM S7P, 4mM ATP, 10 .M GmhA, and 5 1M HIdE in 20 mM 
HEPES (pH 7.4), 20 mM KCl and 100 mM MgCl, at 30°C for 4 h. The reaction 
was stopped by incubation at 95°C for 5 min, and then centrifuged at 13,000 rpm 
for 5 min to remove protein precipitates. ADP-heptose 7-P (anion of m/z 698) was 
purified from the reaction by HPLC-MS as described below. 
HPLC-MS analysis, fractionation and LC-MS/MS. Enzymatic reactions and 
extracts of E. coli-purified Hiss-ALPK1-NTD were subjected to HPLC-MS 


analysis or fractionation performed on Waters HPLC (Column: Atlantis T3; eluents: 
0.1% NH4HCO3/H,0 and CH3CN) with 2998PDA and 3100MS detectors (ESI ion- 
ization). Liquid chromatography with tandem mass spectrometry (LC-MS/MS) 
analysis was performed on an Agilent 1290 Infinity HPLC coupled with an Agilent 
6540 quadrupole time of flight mass spectrometer. A Phenomenex Kinetex F5 
column (2.1 x 100 mm, 2.6 jum) was used for separation. Mobile phases A and B 
were 0.1% formic acid-containing water and acetonitrile, respectively. Column 
temperature was set to 35°C and the flow rate was 0.4 ml min“. The following 
gradient was applied: 0-4 min 0% B, 4-6.5 min from 0% to 50% B, 6.5-6.6 min from 
50% to 100% B, 6.6-8 min 100% B, 8-8.2 min from 100% to 0% B, and 8.2-10 min 
re-equilibration at 0% B. Six microlitres of each sample was injected into the 
instrument and the mass spectrometry data were collected in positive and nega- 
tive ionization modes for detecting ADP-Hep and ADP-heptose 7-P, respectively. 
A collision energy of 20 V was applied for the MS/MS acquisition of ADP-Hep. 
Native mass spectrometry analysis. Hiss-ALPK1-NTD protein purified from 
E. coli BL21 (DE3) was used for native-ESI analysis. Purified recombinant proteins 
(10 tM) were buffer-exchanged into 100 mM ammonium acetate (pH 7.5) using a 
centrifugal buffer exchange column (Micro Bio-Spin 6, Bio-Rad), and one aliquot 
was denatured by adding 0.1% formic acid (final concentration). Both the native 
and denatured protein samples were analysed by direct infusion. Specifically, 3 jl 
of each protein sample was loaded into a nano-flow capillary (borosilicate emit- 
ters, Thermo Scientific) and sprayed into a high-resolution mass spectrometer 
Q-Exactive HF through the nanospray FLEX ion source. The mass spectrometer 
settings were: 0.8-1.2 kV for capillary voltage; S-lens RF level of 100, SID (100) for 
complete desolvation of the native protein sample; capillary temperature at 150°C 
for native protein samples or 200°C for denatured samples; scan range 2,000- 
6,000 m/z for native samples and 800-6,000 for denatured samples; intact protein 
mode with trapping gas pressure set as 0.2. Mass spectra were analysed using the 
Thermo Scientific Protein Deconvolution software*’. The parameters were spec- 
ified according to the mass spectrometer settings. The minimum adjacent range 
of charges was 3-6 for native proteins or 7-20 for the denatured proteins, and 
mass tolerance was 30 p.p.m. The deconvoluted mass of the most abundant ion 
was selected as the mass of the target protein. The mass of the bound ligand was 
calculated as the difference between the native protein and the denatured protein. 
Ultra-performance liquid chromatography with tandem mass spectrometry 
analysis of ADP-heptose 7-P generated from HBP by NMNAT1. The Dionex 
Ultimate 3000 UPLC system was coupled to a TSQ Quantiva Ultra triple- 
quadrupole mass spectrometer (Thermo Fisher), equipped with a heated electro- 
spray ionization probe. Samples were separated using a synergi Hydro-RP column 
(2.0 x 100 mm, 2.5 jm, Phenomenex). A binary solvent system (mobile phase A, 
10 mM tributylamine adjusted with 15 mM acetic acid in water; mobile phase 
B, methanol) was used. A 15-min gradient with a flow rate of 250 iil min! was 
applied as follows: 1-7 min at 15% B; 7-12 min, 15-98% B; 12-12.1 min, 98-15% B; 
12.1-15 min, 15% B. The column chamber and sample tray were held at 45°C and 
10°C, respectively. Data were acquired in the selected reaction monitoring mode 
for ADP-heptose 7-P with transitions of 698/351 in the negative ion mode. Both 
the precursor and fragment ions were collected at the resolution of 0.7 FWHM. 
The source parameters are as follows: spray voltage, 3,000 V; ion transfer tube 
temperature, 350°C; vaporizer temperature, 300°C; sheath gas flow rate, 35 Arb; 
auxiliary gas flow rate, 12 Arb; CID gas, 1.5 mTorr. Data analysis and quantification 
were performed using the software Xcalibur 3.0.63 (Thermo Fisher). 
Chemical cross-linking coupled with mass spectrometry. To obtain well-behaved 
apo-ALPK1-(N+K) complexes for chemical cross-linking coupled with mass spec- 
trometry analysis, GST-ALPK1-NTD was co-expressed with SUMO-ALPK1-KD 
in E. coli BL21 (DE3) AhldE. The GST-ALPK1-(N+K) complexes were immobi- 
lized on a glutathione sepharose column and washed with 20 bed volumes of 20 mM 
Tris-HCl (pH 8.0) containing 500 mM NaCl. GST was removed by overnight diges- 
tion with a homemade HRV 3C protease at 4°C. The supernatant containing the 
released apo-ALPK1-(N+K) was passed through fresh glutathione sepharose beads, 
further purified by gel filtration chromatography, and concentrated to 1 mg ml". 
About 10 jg of apo-ALPK1-(N+K) complex incubated with or without 0.1 mM 
of the indicated sugar molecule was crosslinked with 0.5 mM disuccinimidy] suber- 
ate (DSS) at 25°C for 40 min, and the reaction was quenched by 20 mM NH4HCO3. 
Proteins were precipitated with ice-cold acetone, resuspended in 8 M urea in 100 mM 
Tris-HCl (pH 8.5), digested sequentially with rLys-C (Promega) for 2 h and trypsin 
(in 2 M urea and 100 mM Tris-HCl, pH 8.5) for 12 h. LC-MS/MS analyses were 
performed on an Easy-nLC 1000 II HPLC (Thermo Fisher Scientific) coupled to 
a Q-Exactive HF mass spectrometer (Thermo Fisher Scientific). Peptides were 
loaded on a pre-column (75 jm ID, 6 cm long, packed with ODS-AQ 120 A-10 pm 
beads from YMC) and further separated on an analytical column (75 jum ID, 12 cm 
long, packed with C18 1.8 jum 100 A resin from Welch Materials) using a linear 
reverse-phase gradient from 100% buffer A (0.1% formic acid in HO) to 28% 
buffer B (0.1% formic acid in acetonitrile) in 56 min at a flow rate of 220 nl min“!. 
The top 15 most intense precursor ions from each full scan (resolution 60,000) 
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were isolated for HCD MS2 (resolution 15,000; normalized collision energy 27) 
with a dynamic exclusion time of 45 s. Precursors with 1+, 2+, 7+ or above, or 
unassigned charge states were excluded. Each sample was analysed twice and the 
two technical repeats were combined for data analysis. pLink software*® was used 
to identify cross-linked peptides with precursor and fragment ion mass accuracy at 
20 p.p.m., and the results were filtered by applying a 5% FDR cutoff at the spectral 
level and then an E-value cutoff at 0.001. The inter-domain crosslinks were further 
filtered by requiring >3 spectra and the best E-value < 1.0 x 10°” inat least one of 
the samples to eliminate false identifications”. 
In vitro ALPK1 kinase assay. Two hundred nanograms of ALPK1-(N+K) com- 
plex purified from E. coli BL21 (DE3) AhldE or 3 x Flag-ALPK1 (full-length) 
immunopurified from ALPK1~'~ 293T cells were mixed with 2 jug TIFA—Hisg 
protein (also purified from the AA/dE strain) in a 50-11 reaction containing 
45 mM HEPES (pH 7.4) and 4 mM MgCl. ALPK1 K/M and TIFA T9A mutant 
proteins were used as negative controls. ADP-LD-Hep or HBP (20 tumol) was 
added to test their ability to activate ALPK1. Kinase reactions were initiated by 
adding 100 {1M ATP and allowed to proceed for 30 min or 1 h at 30°C. To assay 
HBP and its activation by NUNAT-mediated modification, 5 jug of NMNAT1 or 
NMNAT3 protein purified from E. coli BL21 (DE3) Ah/dE were mixed with 20 1M 
HBP also in a 50-1 reaction in the presence of 100 1M ATP; the reaction was 
incubated at 30°C for 1 h. Following addition of 200 ng of ALPK1-(N+K) and 2 pg 
of TIFA-Hisg, the reaction was allowed to proceed for another 1 h. The reactions 
were stopped by adding 4x SDS loading buffer and subjected to SDS-PAGE anal- 
yses. Phosphorylated TIFA—His¢ protein was detected by immunoblotting using 
the anti-pT9-TIFA antibody as described above. 
Protein crystallization and structure determination. Purified proteins were con- 
centrated to 10 mg ml for crystallization screens at 20°C using the sitting-drop 
vapour diffusion method. The drop, containing 1 j1l of protein solution and 
1 ul of reservoir solution, was equilibrated over 100 i1l reservoir solution. Initial 
crystallization hits of SeMet-labelled ALPK1 (1-451) and native ALPK1 (1-446) 
appeared from the PEG-ion Kit and the Crystal Screen Kit (Hampton Research), 
respectively. Qualified crystals of SeMet-labelled ALPK1 (1-451) were obtained 
in the reservoir solution containing 6% Tacsimate (pH 5.8) and 6.8% PEG 3550 
within 1 week, and the best-diffracted crystals of native ALPK1 (1-446) were 
grown from the reservoir solution containing 0.1 M CH3;COONa (pH 4.0) and 
1.4 M HCOONa. For data collection, the crystals were soaked in cryoprotectant 
solution containing the reservoir buffer supplemented with 30% ethylene glycol 
for SeMet-labelled ALPK1 (1-451) or 15% glycerol for native ALPK1 (1-446) 
followed by flash-freezing with liquid nitrogen. Diffraction data were collected 
at the Shanghai Synchrotron Radiation Facility (Shanghai) beamline BL18U1 for 
SeMet-labelled ALPK1 (1-451) and BL19U1 for native ALPK1 (1-446) under the 
wavelengths of 0.97776 A and 0.978534, respectively. Data were processed in X-ray 
Detector Software. The phase was determined by the single wavelength anomalous 
dispersion method and automatic model building was performed in PHENIX. The 
rest of the model was manually built with Coot. The structure of ALPK1-NTD 
(residues 1-446) was refined in PHENIX, and manual modelling was performed 
between refinement cycles. The statistics of data collection and refinement are 
summarized in Extended Data Table 1a. Ramachandran statistics indicated that 
all the residues are in the allowed region, and 98.2% fall into the favoured region. 
The quality of the final model was validated by MolProbity. 
Mouse experiments and measurements of cytokines. Wild-type C57BL/6 mice 
were purchased from Vital River Laboratory Animal Technology (Beijing). To 
generate Alpk1~/~ mice, a gRNA (GGCCCTTCGTGCCTGAAAAG) targeting 
exon 3 of Alpk1 was designed with an online gRNA designing tool (http://crispr. 
mit.edu/). In vitro transcribed guide RNA and Cas9 mRNA were co-microinjected 
into C57BL/6 mice-derived zygotes. The tail-end genomic DNA of each offspring 
was amplified with the forward primer 5’-CCTGTAGGGCAGAGTAGGCT-3’ 
and the reverse primer 5‘-TTCAAGGTGACAGGTTTCGT-3’. Sanger sequencing 
was performed to analyse the PCR products and identify the founders with out- 
of-frame indels. Founders with the same out-of-frame indels were intercrossed 
to obtain homozygous Alpk1~/~ mice (Supplementary Table 2). All mice were 
maintained in the specific pathogen-free facility at National Institute of Biological 
Sciences, Beijing. All mouse experiments were carried out in accordance with the 
national guidelines for housing and care of laboratory animals (Ministry of Health, 
China) and the protocol is in accordance with institutional regulations after review 
and approval by the Institutional Animal Care and Use Committee at National 
Institute of Biological Sciences, Beijing. 

Eight-week-old female wild-type or Alpk1~'~ C57BL/6 mice were used for air 
pouch model construction, followed by HBP or ADP-LD-Hep administration. Five 
mice were assayed for each group. The air pouch was constructed as previously 
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described“, with slight modifications. In brief, 3 ml of sterile air was injected 
into the subcutaneous tissue of the back of the mice. Three days later, another 
2.5 ml sterile air was injected into the same pouch to maintain patency. Six days 
later, synthetic HBP or ADP-LD-Hep (2 mg kg"! body weight) dissolved in 
300 ul of saline was injected into the well-developed pouch (saline alone was 
injected into the control mice). Another 3 h later, blood samples from mouse tail 
end were collected for serum isolation, the mice were then killed, and the pouch 
was opened with a small hole. Immediately afterwards, 600 \1l of PBS was injected 
into the pouch, and the fluids were gently sucked into and out of the bulb to mix 
the contents. All fluids within the pouch were collected and centrifuged; the super- 
natants and the cell pellets were used for ELISA analysis and counting the total 
leukocyte number, respectively. Cytokine levels in the sera and air pouch fluid 
supernatants were determined using ELISA kits (IL-6, IP-10, GRO-a and MCP1). 
ProcartaPlex multiplex immunoassay (eBioscience) was also performed to measure 
up to 36 cytokines in both the air pouch washes and the sera. 

For infection, single clones of B. cenocepacia J2315 strain? were cultured at 
37°C in LB broth with shaking for 18 h. The bacterial cultures were diluted by 1:50 
in fresh LB broth, and grown until ODgo9 reached 0.8. The bacteria were washed 
with PBS and then diluted in 1% gelatin-containing PBS. Indicated 6-week-old 
male mice (C57BL/6 background) were anaesthetized by intraperitoneal injection 
of 0.8% pentobarbital sodium and infected intratracheally with 30 il of bacteria 
suspension (5 x 10’ c.f.u. for each mouse) or 1% gelatin-containing PBS. The mice 
were killed 24 h post-infection. The lungs were removed and homogenized to 
measure bacterial burden or cytokine concentration by ProcartaPlex multiplex 
immunoassay (eBioscience Inc.). 

Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. All data supporting the findings of this study are included in this 
manuscript and its Supplementary Information files. The atomic coordinates and 
structure factors of the ALPK1-NTD-ADP-heptose complex have been deposited 
in the Protein Data Bank under the accession code 5Z2C. 
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Extended Data Fig. 1 | Analyses of the ADP-Hep biosynthesis 
pathway and ADP-Hep-induced NF-«B activation. a, Low calcium- 
induced secretion of the T3SS substrates in the hidE transposon 
mutant. yscC is required for T3SS assembly. b, Schematic of the classical 
ADP-Hep biosynthesis pathway in Gram-negative bacteria. c, ADP- 
Hep-dependent autotransporter heptosylation in AgmhB mutant. A 
fragment of the AIDA-I autotransporter (GST-AIDA-I50_¢00-Flag) and 
its heptosyltransferase (AAH) were expressed in Y. pseudotuberculosis 
A6 deleted of a gene in the ADP-Hep biosynthesis pathway. AIDA-I 
heptosylation was assessed using the ECL glycoprotein detection kit. 
d, NF-«B activation in 293T cells electroporated with HBP or H1P 
(obtained by treating HBP with recombinant GmhB for the indicated 


times). e, Effects of ALPK1 knockout on other known NF-KB pathways. 
NOD1, NOD2 and MYD88 were transfected into the cells. Wild-type 
cells and two ALPK1~'~ 293T clones (KO-1/2) were treated with ADP- 
LD-Hep (100 1M), TNF (20 ng ml"!), C12-iE-DAP (10 ng ml!) or MDP 
(10 ng ml"). f, g, Effects of NOD1/2 deficiency on ADP-Hep-induced 
NF-KB activation and TIFA foci formation. Wild-type HeLa cells and two 
NOD 1/2 double-knockout clones (DKO-1/2) were assayed. eGFP-TIFA 
was transfected into the cells (g). Scale bar, 20 jum. d-f, NF-«B activation 
was assessed by luciferase reporter assay (mean + s.d. from three technical 
replicates); two-tailed unpaired Student's t-test was performed (*P < 0.05, 
**P< 0.01, *** P< 0.001, NS, not significant). All data are representative 
of three independent experiments. 
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Extended Data Fig. 2 | FACS-based genome-wide CRISPR-Cas9 screen 
identifies the ALPK1-TIFA-TRAF6 axis that mediates ADP-Hep- 
induced NF-«B activation. a, Flow chart of the CRISPR-Cas9 screen for 
genes required for activation of NF-kB induced by extracellular ADP-LD- 
Hep but not TNF in 293T cells. b, Immunoblots of Flag-ALPK1 (wild type 
or its K/M mutant) and Flag-TIFA (wild type or its T9A mutant) expressed 
in knockout 293T cells. c, Requirement of T9 of TIFA for ADP-Hep- 
induced NF-KB activation. Flag-TIFA (wild type or T9A) was transfected 
into two TIFA~/~ 293T clones (KO-1/2). d, Requirement of ALPK1 kinase 
activity for ADP-LD-Hep-induced formation of TIFA foci. eGFP-TIFA 
was stably expressed in wild-type or ALPK1~/~ 293T cells expressing 
Flag—ALPK1 (wild type or the kinase-dead K/M mutant). Scale bar, 20 1m. 
e, Fluorescence imaging of mCherry-ALPK1 and eGFP-TIFA in 293T 


ADP-LD-Hep 


cells treated with or without ADP-LD-Hep. Scale bar, 10 jm. f, Phos-tag 
gel assay of myosin I phosphorylation in ADP-LD-Hep-treated cells. 
Flag-myosin I was transfected into indicated 293T cells. immunoblots of 
total cell lysates separated on phos-tag gel or regular SDS gels are shown. 
g, Effects of ALPK1 or TIFA knockout on activation of NF-KB induced 

by ADP-LD or DD-Hep electroporation. ALPK1~/~ and TIFA~/~ 293T 
cells were complemented as indicated. h, ADP-LD-Hep-induced co- 
immunoprecipitation of TIFA with ALPK1 and TRAF6 in transfected 
293T cells. c, g, NF-kB activation was assessed by luciferase reporter assay 
(mean + s.d. from three technical replicates); two-tailed unpaired Student’s 
t-test was performed (*P < 0.05, **P< 0.01, ***P< 0.001, NS, not 
significant). d, e, Confocal images with Hoechst-stained nuclei. All data 
are representative of three independent experiments. 
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Extended Data Fig. 3 | See next page for caption 
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Extended Data Fig. 3 | The ALPK1-TIFA pathway mediates T3SS- 
dependent and -independent activation of NF-«B by various bacterial 
pathogens. a, b, Requirement of ALPK1 kinase activity and TIFA 

for Y. pseudotuberculosis-induced activation of NF-«B. Wild-type or 
ALPK1~'~ 293T cells expressing Flag~ALPK1 (wild type or K/M) or 
TIFA~/~ 293T cells expressing Flag-TIFA were infected with indicated 

Y. pseudotuberculosis strains. NF-kB activation was assessed by luciferase 
reporter assay (a) or eGFP reporter expression (b). Scale bar, 50 1m (b). 
c-e, Requirement of ALPK] kinase activity for Y. pseudotuberculosis (or 
S. flexneri 2457T)-induced phosphorylation of TIFA (c, d) or formation of 
eGFP-TIFA foci (e). Flag- or eGFP-TIFA was expressed in wild-type or 
ALPK1~'~ 293T cells expressing Flag-ALPK1 (wild type or K/M). TIFA 
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phosphorylation was assessed by anti-pT9-TIFA immunoblotting (c, d). 
Y. pseudotuberculosis A6 was labelled with mCherry (scale bar, 20 jum) 
(e). f, g, NF-KB luciferase reporter activation and formation of eGFP- 
TIFA foci induced by T3SS-negative bacteria. Wild-type, ALPK1~'~ or 
Flag-ALPK1-complemented ALPK1~/~ 293T cells were infected with 
DAEC 2787, ETEC H10407, B. cenocepacia J2315 or their AhidE mutants. 
Scale bar, 20 jum. a, f, Luciferase data are shown as mean +s.d. from three 
technical replicates; two-tailed unpaired Student's t-test was performed 
(*P < 0.05, **P < 0.01, *** P< 0.001). b, e, g, Confocal images with 
Hoechst-stained nuclei. All data shown are representative of three 
independent experiments. 
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Extended Data Fig. 4 | Co-expression of ALPK1-NTD and ALPK1-KD ALPK1-NTD and ALPK1-KD co-expression in mediating ADP-LD- 


can support ADP-Hep and Y. pseudotuberculosis-induced activation Hep (d) or Y. pseudotuberculosis (e)-induced TIFA phosphorylation. 

of NF-«B and conservation of ADP-Hep-binding residues in ALPK1- Immunoblots of total cell lysates using indicated antibodies are shown. 
NTD. a, c, Immunoblots of ALPK1 N- and C-terminal truncation mutants f, Multiple sequence alignment of the NTDs of ALPK1 from indicated 
expression in indicated 293T cells. NF-«B luciferase reporter activation organisms. Red residues are those involved in ADP-Hep binding, revealed 
induced by ADP-LD-Hep and Y. pseudotuberculosis A6 are in Fig. 2h. by the crystal structure of human ALPK1-NTD-ADP-LD-Hep complex 
b, Mapping the minimal N- and C-terminal regions of ALPK1 sufficient (Fig. 3g). Data shown are representative of three (a-c) or two (d, e) 

for ADP-LD-Hep-induced activation of NF-«B. Fluorescence images independent experiments. 


of NF-kB-eGFP expression are shown. Scale bar, 50 jm. d, e, Assay of 
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Extended Data Fig. 5 | See next page for caption 
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Extended Data Fig. 5 | Direct binding of ADP-Hep to ALPK1-NTD. 

a, E. coli-purified ALPK1-(N+K) complex on a Coomassie blue-stained 
SDS-PAGE gel. b-e, HPLC-MS fractionation, NF-«B-inducing activity 
and mass spectrometry of small-molecule extracts from Hiss-ALPK1- 
NTD purified from wild-type E. coli BL21 (DE3). The small-molecule 
extracts, obtained by protein denaturing and precipitation (95°C for 5 
min), were analysed by HPLC-MS with 17 fractions obtained (b). The 17 
fractions were used to treat 293T cells (c) or cells expressing Flag~TIFA 
(d); NF-«B luciferase activity (mean + s.d. from three technical replicates) 
and anti-pT9-TIFA immunoblotting are in c and d, respectively. e, Mass 
spectrometry of fraction 6 identified three major ions corresponding to 


AMBP, ADP and ADP-Hep. f, LC-MS/MS of ADP-Hep in fraction 6 (b, e) 
or synthetic ADP-LD-Hep standard. g, MS/MS spectra of the [M+H]* 
product ions of ADP-Hep in fraction 6 in comparison with those of 
synthetic ADP-LD-Hep. The heptose of ADP-Hep was not shown owing 
to neutral loss. h, BLI assay of ADP-LD-Hep or S7P binding to ALPK1- 
(N+K). ALPK1-(N+K) complexes purified from E. coli BL21 (DE3) 
AhldE were biotinylated in vitro. Sensorgrams of the binding to ALPK1- 
(N+K) by different concentrations of the indicated sugar (colour lines) are 
shown. Grey lines are from model fits. Data shown are representative of 
three (a, h) or two (b-g) independent experiments. 
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Extended Data Fig. 6 | HBP binding is insufficient to render 
ALPK1-KD competent for substrate recognition and phosphate 
transfer. a, BLI assay of HBP binding to in vitro-biotinylated apo-ALPK1- 
(N+K). Sensorgrams of the binding in different concentrations of HBP 
(colour lines) are shown. Grey lines are from model fits. b, Excess HBP 
competitively inhibits activation of ALPK1 by ADP-Hep. Flag-ALPK1 
purified from 293T cells was incubated with purified TIFA—His, and 
ADP-Hep in the presence of a titrating amount of HBP. c, d, Effects of 
HBP binding on ALPK1 phosphorylation and recognition of a 15-residue 
peptide substrate derived from TIFA. The N-terminal 15-residue 
sequences of TIFA were fused to GST (GST-TIFA.N15). d, GST-pulldown 
assay of ALPK1-(N+K) binding to GST-TIFA.N15. e, Effect of HBP 
binding on ATP hydrolysis activity of ALPK1. Apo-ALPK1-(N+K) 

(wild type or K/M) was incubated with HBP or ADP-LD-Hep, and further 
reacted with ATP. Percentages of ATP consumption at indicated reaction 
conditions are shown (mean + s.d. from three technical replicates). 


apo-ALPK1-(N+K) + HBP 


apo-ALPK1-(N+K) + ADP-LD-Hep 


Two-tailed unpaired Student’s t-test was performed (*P < 0.05; **P< 0.01; 
*** P< (001). f, Graphical representation of DSS-crosslinked residues 
between ALPK1-NTD and ALPK1-KD identified by chemical cross- 
linking coupled with mass spectrometry. Apo-ALPK1-(N+K) was 

left untreated, or incubated with HBP or ADP-LD-Hep. Crosslinking 
connections are depicted by straight lines, and the corresponding raw 
mass spectrometry data are in Supplementary Table 3. g, Crystal structure 
of TRPM7 kinase domain in complex with AMPPNP (PDB code, 

11A9). TRPM7 is shown in cartoons and AMPPNP is in sticks. The loop 
containing K1727 and 11736 (shown in sticks, corresponding to K1140 and 
K1149 in human ALPK1, respectively) is in yellow. Phosphorylation of 
TIFA and GST-TIFA.N15 was assayed by anti-pT9-TIFA immunoblotting 
(b, c). Apo-ALPK1-(N+K), TIFA-His,, and GST-TIFA.N15 were purified 
from E. coli BL21 (DE3) AhldE (a-f). Data shown are representative of 
three (a-e) and two (f) independent experiments. 
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Extended Data Fig. 7 | See next page for caption 
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Extended Data Fig. 7 | HBP can be transformed into ALPK1 activation- 
competent ADP-heptose 7-P by bacterial or host adenylyltransferase. 

a, b, Requirement of ALPK1 kinase activity for cytosolic HBP-induced 
phosphorylation of TIFA (a) and activation of NF-«B (b). HBP was 
electroporated into wild-type or ALPK1~/~ 293T cells expressing Flag- 
ALPK1 (wild-type or K/M). c, d, Enzymatically synthesized HBP could 
induce ALPK1-mediated NF-«B activation (c) and TIFA phosphorylation 
in vitro (d). S7P was reacted with recombinant GmhA or HIdE or both. 
Following protein denaturing and precipitation, reaction supernatants 
were added to wild-type or ALPK1~'~ 293T cells containing an empty 
vector or Flag-ALPK1 (c). Enzymatically synthesized HBP product 
(HBP*"’Y™"), synthetic ADP-LD-Hep or HBP was incubated with 
ALPK1-(N-+K) in the presence of TIFA-Hisg (d). e, Schematic of HIdE 
adenylyltransferase synthesis of ADP-Hep and ADP-heptose 7-P from 
H1P and HBP, respectively. f, g, NF-KB activation by (f) and LC-MS of (g) 
enzymatically synthesized HBP product. Indicated reaction products were 
added to wild-type or ALPK1~'~ 293T cells (f) or analysed by LC-MS (g). 
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h, i, NF-«B (h) and in vitro ALPK1 activation (i) by ADP-heptose 7-P. 
HPLC-purified ADP-heptose 7-P was added to wild-type or ALPK1~/~ 
293T cells expressing Flag-ALPK1 (wild-type or K/M). i, Equal amounts 
of the indicated sugars were incubated with ALPK1-(N+K) in the 
presence of TIFA-Hisg. j, Ultra-performance liquid chromatography with 
tandem mass spectrometry of the reaction product of HBP and NMNAT1. 
Purified ADP-heptose 7-P, synthesized by reacting HBP with HIdE, was 
used as the standard. k, Effect of NMNAT1 overexpression on activation of 
NF-«B by electroporation of HBP into 293T cells. The immunoblots show 
NMNATI expression. Apo-ALPK1-(N+K) was purified from E. coli BL21 
(DE3) AhidE and phosphorylation of TIFA was assessed by anti-pT9- 
TIFA immunoblotting (d, i). b, c, f, h, k, NF-KB activation was assessed 

by luciferase reporter assay (mean + s.d. from three technical replicates); 
two-tailed unpaired Student’s t-test was performed (*P < 0.05; **P< 0.01; 
**P < (001). Data shown are representative of three (a-d, h, i, k) or two 
(f, g, j) independent experiments. 
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Extended Data Fig. 8 | ALPK1 Q67A/Y68A mutants that can 
discriminate cytosolic HBP from ADP-Hep resist activation by ADP- 
heptose 7-P. a—c, Effects of ALPK1 Q67A, Y68A or Q67A/Y68A double 
mutations on activation of NF-kB by HBP, ADP-Hep and ADP-heptose 
7-P. ALPK1~'~ 293T cells expressing the indicated Flag-ALPK1 mutants 
were electroporated with excess HBP or ADP-LD-Hep (a), or treated with 
excess ADP-heptose 7-P or ADP-LD-Hep (c). Anti-Flag immunoblots (b) 
show expression of the ALPK1 mutants. d, In vitro TIFA phosphorylation 
assay of ALPK1 Q67A/Y68A activation by HBP, ADP-heptose 7-P, or 
ADP-LD-Hep. Flag-ALPK1 Q67A/Y68A was purified from 293T cells. 

e, f, In vitro TIFA phosphorylation assay of ALPK1 Q67A/Y68A activation 
by small-molecule extracts from HBP-electroporated cells (SE"?). 293T 


cells electroporated with HBP were used to prepare the small-molecule 
extracts, and synthetic HBP was included as the control. Apo- ALPK1- 
(N+K) (e) and Flag-ALPK1 (f) were purified from E. coli BL21 (DE3) 
AhldE and 293T cells, respectively. g, Effects of ALPK1 Q67A/Y68A 
double mutations on Y. pseudotuberculosis-induced activation of 
NF-kB. ALPK1~'~ 293T cells expressing Flag-ALPK1 (wild-type or the 
Q67A/Q68A mutant) were left uninfected or infected with indicated 

Y. pseudotuberculosis strains. a, c, g, NF-KB activation was assessed by 
luciferase reporter assay (mean + s.d. from three technical replicates); 
two-tailed unpaired Student’s t-test was performed (*P < 0.05; **P< 0.01; 
*** P< 0.001). All data are representative of at least two independent 
experiments. 
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Extended Data Fig. 9 | ADP-Hep but not HBP induces Alpk1-dependent 
cytokine expression in mice. HBP or ADP-LD-Hep (2 mg kg") or the 
saline control were injected into the dorsal air pouches of wild-type (a-d) 
or Alpk1~'~ (d) mice (C57BL/6). n (biologically independent animals) =5 
for each group of treatment. Cytokine profiling was determined by the 
multiplex immunoassay (a-c) or ELISA (d). a, b, Heat maps of indicated 
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cytokine concentrations in the air pouch (a) and the serum (b) of injected 
mice. c, d, Cytokine concentrations in the serum shown as mean + s.e.m. 
(two-tailed unpaired Student's t test, *P < 0.05, **P < 0.01, ***P < 0.001, 
ns, not significant). All data are representative of two independent 
experiments. 
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Extended Data Table 1 | Data collection and refinement statistics for ALPK1-NTD structure (a) and Dali-search results of the structure (b) 


a 
Se_ALPK1 (1-451) Native ALPK1 (1-446) 
Data collection 
Space group P4)2)2 C2 
Cell dimensions 
a, b,c (A) 185.52, 185.52, 334.11 169.26, 214.39, 173.15 
a, By (°) 90.00, 90.00, 90.00 90.00, 110.33, 90.00 
Wavelength (A) 0.97776 0.97853 
Resolution (A)? 50.00-4.46 (4.57-4.46) 50.00-2.59 (2.66-2.59) 
Rynerge 0.174 (1.033) 0.069 (0.660) 
I/o(1) 9.42 (2.32) 14.62 (1.99) 
Completeness (%) 99.9 (100.0) 96.8 (95.8) 
Redundancy 14.1 (13.0) 3.6 (3.6) 
Refinement 
Resolution (A) 48.89-2.59 
No. of reflections 173,963 
Ryor’/Raee 0.2420/0.2628 
No. of atoms 
Protein 30,723 
Water 0 
B factors 
Protein 62.91 
Water 0 
r.m.s deviations 
Bond lengths (A) 0.005 
Bond angles (°) 0.599 
b 
PDB RMSD Aligned Identity 
No. euas Z score (A) ere (%) Molecule name 
1 5a6c-B 17.0 4.7 269 12 G-protein-signaling modulator 2, AFADIN 
2 5a7d-B 16.9 6.4 269 12 PINS 
3 5205-O 16.4 4.8 271 10 Anaphase-promoting complex subunit 1 
4 Sdbk-A 15.6 3.5 238) 13 Transcriptional regulator/TPR domain protein 
5) 3txm-A 14.9 4.2 231) 11 26S proteasome regulatory complex subunit P42B 
6 4ila-B 14.8 32 228 8 Response regulator aspartate phosphatase I 
7 5001-A 14.8 3.4 215 113} BKLC (Bacterial Kinesin-Light Chain-like) 
8 4gyo-B 13.7 4.5 240 12 Response regulator aspartate phosphatase J 
§) 4ui9-F 13.4 4.0 233 8 Anaphase-promoting complex subunit 1 
10 514k-Q 13.3 3.9 242 11 26S proteasome non-ATPase regulatory subunit 4 


In a, single SeMet or native crystal was used for data collection and structure determination. 
aValues in parentheses are for the highest-resolution shell. 
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Widespread intronic polyadenylation inactivates 
tumour suppressor genes in leukaemia 


Shih-Han Lee”, Irtisha Singh?*, Sarah Tisdale!, Omar Abdel-Wahab’, Christina S. Leslie? & Christine Mayr!* 


DNA mutations are known cancer drivers. Here we investigated 
whether mRNA events that are upregulated in cancer can 
functionally mimic the outcome of genetic alterations. RNA 
sequencing or 3’-end sequencing techniques were applied to normal 
and malignant B cells from 59 patients with chronic lymphocytic 
leukaemia (CLL)!-3. We discovered widespread upregulation of 
truncated mRNAs and proteins in primary CLL cells that were 
not generated by genetic alterations but instead occurred by 
intronic polyadenylation. Truncated mRNAs caused by intronic 
polyadenylation were recurrent (1 = 330) and predominantly 
affected genes with tumour-suppressive functions. The truncated 
proteins generated by intronic polyadenylation often lack the 
tumour-suppressive functions of the corresponding full-length 
proteins (such as DICER and FOXN3), and several even acted in an 
oncogenic manner (such as CARD11, MGA and CHST11). In CLL, 
the inactivation of tumour-suppressor genes by aberrant mRNA 
processing is substantially more prevalent than the functional loss 
of such genes through genetic events. We further identified new 
candidate tumour-suppressor genes that are inactivated by intronic 
polyadenylation in leukaemia and by truncating DNA mutations in 
solid tumours*>. These genes are understudied in cancer, as their 
overall mutation rates are lower than those of well-known tumour- 
suppressor genes. Our findings show the need to go beyond genomic 
analyses in cancer diagnostics, as mRNA events that are silent at 
the DNA level are widespread contributors to cancer pathogenesis 
through the inactivation of tumour-suppressor genes. 

In addition to DNA-based mutations, recent studies found that altera- 
tions in mRNA processing, including splicing, promote tumorigenesis®. 
In CLL, up to one-quarter of patients have mutations in ATM or SF3B1, 
but one-third have less than two mutated driver genes, and most 
patients (58%) only have a 13q deletion or have a normal karyotype*”®. 
Here, we investigated whether intronic polyadenylation (IPA) might 
serve as a new driver of tumorigenesis. Because 16% of genes in normal 
immune cells use IPA to generate truncated mRNAs that contribute 
to transcriptome diversity’, we hypothesized that cancer-specific IPA 
would generate truncated proteins that lack essential domains, and 
thus, may phenocopy truncating (TR) mutations (Fig. 1a). 

Using 3/-seq, a 3’-end sequencing method, on 44 samples including 
B cells from healthy donors and from patients with CLL, we identified 
5,587 IPA isoforms, including 3,484 without previous annotation! 
(Extended Data Table 1 and Methods). We validated 4,630 IPA iso- 
forms using RNA sequencing (RNA-seq) and additional 3’-seq data”!® 
(Extended Data Fig. 1a, b). To assess IPA usage in CLL, we first identi- 
fied the normal B cell subset, the gene expression profile of which was 
most closely related to CLL cells. Lymphoid tissue-derived CD5* B cells 
were most similar (Extended Data Fig. 2), but clustered separately from 
CLL samples based on IPA site usage (Extended Data Fig. 1c). Using a 
generalized linear model (GLM), we identified 931 IPA events with sig- 
nificantly higher expression among 13 CLL samples, but low or absent 
expression in CD5* B cells’? (Fig. 1b, Extended Data Fig. 1d). Because 
CLL IPAs are detectable by RNA-seq, we used an unrelated RNA-seq 


dataset to validate our CLL-IPA events? (Fig. 1c). We verified up to 71% 
of testable IPAs by this independent method and dataset (Extended 
Data Fig. 1d). For further analysis, we combined the datasets (n = 59 
CLL samples) and focused only on CLL-IPAs that were present in more 
than 10% of the sample cohort resulting in 330 CLL-IPAs, derived from 
306 genes (Fig. 1d, Supplementary Table 1). Although CLL-IPAs were 
detected in all CLL samples, one-third of the samples had a significantly 
higher number of CLL-IPAs (Fig. le, Extended Data Fig. le). 

To investigate whether CLL-IPAs express truncated proteins, we per- 
formed western blots on 13 candidates. Whereas normal B cells only 
expressed the full-length proteins, the malignant B cells also expressed 
truncated proteins, the size of which was consistent with the predicted 
size of IPA-generated proteins (Fig. 2a, Extended Data Figs. 3 and 4). 

To rule out that proteolytic cleavage truncates the proteins, we vali- 
dated the presence of the IPA-generated truncated mRNAs (Extended 
Data Fig. 5a). Moreover, we were able to induce IPA isoform expression 
through the downregulation of splicing factors or through the inhibi- 
tion of 5’ splice site recognition using an antisense oligonucleotide, 
indicating that deregulated mRNA processing can cause the expression 
of a truncated protein!” (Extended Data Fig. 5b). 

Many of the truncated proteins generated by CLL-IPAs are markedly 
similar to the predicted protein products produced by TR mutations, 
suggesting that CLL-IPAs may functionally mimic the outcome of 
genetic mutations (Fig. 2b, Extended Data Fig. 6a). To test this, we 
investigated the functional consequences of the expression of IPA and 
full-length protein isoforms of four candidates in malignant B cells. 
CARD11 is a positive regulator of the NF-«B pathway and is important 
for lymphocyte survival and proliferation'’. We observed substantial 
CARD11 IPA protein production, compared to only slightly increased 
CARD11 IPA mRNA expression, indicating that the truncated protein 
is more stable and may activate the NF-«B signalling pathway more 
potently than the full-length protein" (Fig. 2a). To test this, we exclu- 
sively knocked down either full-length or CARD 11 IPA in a malignant 
B cell line that expresses CARD 11 IPA at comparable levels to those 
expressed by CLL cells (Extended Data Fig. 6b, c). We measured phos- 
phorylated p65 (also known as RELA) to assess NF-KB activity and 
found significantly lower activity after knockdown of CARD11 IPA 
than of the full-length protein (Fig. 2c, Extended Data Fig. 6d). Thus, 
CARD11 IPA activates NF-kB more potently than full-length CARD11, 
suggesting that it may mimic activating mutations present in high- 
grade lymphomas!*. CARD11 IPA may contribute to NF-«B activation 
in CLL, in which the signalling components are rarely mutated. 

DICER IPA generates a truncated protein that partially lacks the RNase 
IIIB domain responsible for microRNA (miRNA) processing!® (Fig. 2b). In 
contrast to full-length DICER, DICER IPA entirely lacks miRNA cleavage 
ability and mimicked TR mutations that remove both RNase III domains'® 
(Fig. 2b, d, Extended Data Fig. 6e, f). Although DICER IPA does not act 
in a dominant-negative manner, its expression reduces functional DICER 
protein, thus potentially decreasing endogenous miRNA expression. 

The tumour-suppressor gene (TSG) MGA is targeted by TR muta- 
tions in CLL and solid cancers*””” (Fig. 2b). MGA negatively regulates 


1Cancer Biology and Genetics Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA. 2Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New 
York, NY, USA. *Tri-| Program in Computational Biology and Medicine, Weill Cornell Graduate College, New York, NY, USA. “Human Oncology and Pathogenesis Program, Memorial Sloan Kettering 
Cancer Center, New York, NY, USA. *These authors contributed equally: Shih-Han Lee, Irtisha Singh. *e-mail: mayre@mskcc.org 
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Fig. 1 | Hundreds of genes generate recurrent CLL-IPAs. a, Schematic 
showing full-length mRNA and protein expression in normal cells and 

the generation of a truncated mRNA and protein through cancer-specific 
IPA, despite no difference in DNA sequence. Polyadenylation sites (pA) 
are shown in light green. Loss of essential protein domains (dark green 
boxes) through cancer-gained IPA may inactivate TSGs, thus contributing 
to cancer pathogenesis. b, Representative CLL-IPAs (from n = 330) are 
shown. mRNA 3’ ends detected by 3’-seq are depicted as peaks, the heights 
of which correspond to transcript abundance shown in transcripts per 
million (TPM). The bottom panel shows RNA-seq reads and numbers 
correspond to read counts. Full-length and IPA-generated truncated 
proteins are depicted in grey, known domains are shown in green and 

the domains lost through IPA are named. For CLL-IPA, the number 

of retained and novel amino acids (aa) and amino acids of full-length 
proteins are given. CC, coil-coil; MemB, memory B cells, NB, naive B cells. 


c, Representative RNA-seq tracks from two independent CLL datasets 

are shown as in b; one is indicated by ‘L’ before the patient number (CLL- 
L14). B3 denotes donor 3. Zoomed-in view shows the exonized part of 
intron 23 of DICER1 (green). d, Difference in relative abundance (usage) 
of IPA isoforms between CLL and normal CD5* B cells. A GLM was used 
to identify significant events. CLL-IPAs with significantly higher usage 
are shown in red (false discovery rate (FDR)-adjusted P < 0.1, usage 
difference > 0.05, TPM in CD5* B <8) and CD5* B-IPAs are shown 

in blue. Grey denotes IPAs present in CLL and CD5* B cells without 
significantly different usage. e, Number of CLL-IPAs per sample is shown 
as box plots, in which the horizontal line denotes the median; boxes denote 
the 25th and 75th percentiles; error bars denote the range. CLL high, 

n= 21/59, median of CLL-IPAs/sample = 98 versus CLL low, n = 38/59, 
median = 29. ***P=6 x 10~!°, two-sided Mann-Whitney U-test. 
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Fig. 2 | IPA-generated truncated proteins resemble the protein products 
of truncating DNA mutations and have cancer-promoting properties. 

a, RNA-seq and 3’-seq data of functionally validated CLL-IPAs (n= 5) 

as in Fig. 1b. The remaining tracks are shown in Extended Data Fig. 3. 
Endogenous full-length proteins are detected by western blot analysis in 
CLL and normal B cells (B lymphoblastoid cells; BLCL), whereas IPA- 
generated truncated proteins (red arrows) are only present in primary CLL 
cells. Actin was used as loading control. The experiment was replicated 
with similar results (CARD11, n = 4, DICER, n=3, MGA, n=2). For gel 
source data see Supplementary Fig. 1. Asterisks denote an unspecific band. 
b, Protein models are shown as in Fig. 1b. The amino acid positions of 
recurrent TR mutations are shown in blue. c, Endogenous phospho-NF-B 
p65 levels are shown as normalized mean fluorescent intensity (MFI) 
values after short hairpin RNA (shRNA)-mediated knockdown of full- 
length (FL) CARD11 and CARD11 IPA. n=5 (FL shRNA1 and control 
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(ctrl) shRNA) or n=6 (IPA; shRNA2 n = 3, shRNA3 n= 3) biologically 
independent experiments. Data are mean + s.d. **P = 0.002, two-sided 
Kruskal-Wallis test; P value of two-sided Mann-Whitney U-test was 
adjusted for multiple testing, *adjusted P= 0.036. d, miRNA cleavage 
assay, performed twice with similar results, showing processing of 
pre-let-7i into mature let-7i by V5-DICER. Mock indicates that no protein 
was added. V5-DICER IPA shows a complete loss of function, but no 
dominant-negative activity. nt, nucleotides. e, RT-PCR of endogenous 
MYC target genes after expression of full-length or MGA IPA in Raji cells. 
Shown are GAPDH-normalized values as mean + s.d. from three biological 
replicates, each performed in technical triplicates. *P < 0.05, **P < 0.001, 
two-sided t-test for independent samples. NS, not significant. Exact 

P values are shown in Supplementary Fig. 1. MGA represses all MYC 
target genes. BS, binding sites. 
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Fig. 3 | TSGs are enriched among CLL-IPAs. CLL-IPAs and TR mutations 
in CLL target the same genes but in different patients. a, The fraction of 
the retained coding region (CDR) is shown for genes that generate 
CLL-IPAs (n = 306, median fraction of retained CDR=0.21; 112 amino 
acids) and B-IPAs (n= 2,690, median fraction of retained CDR = 0.45; 221 
amino acids). ***P=1 x 10~'°, two-sided Mann-Whitney U-test. 

Box plots are as in Fig. le. b, RT-PCR analysis showing the expression 

of full-length and IPA isoforms for two TSGs (DICERI and NUP98) in 
samples from two patients with CLL that were collected over several years. 
CLL11: T1, 17 months after diagnosis, T2, 24 months, T3, 44 months; 
CLL6: T1, 16 months, T2, 49 months, T3, 91 months (42 months after 


the MYC transcriptional program and represses genes with MYC- and 
E2F-binding sites in a Polycomb-dependent manner'®*””, Expression of 
MGA from constructs validated MGA IPA detected in CLL cells and 
confirmed the repressive effect of MGA on MYC target gene expression 
in malignant B cells (Fig. 2e, Extended Data Fig. 6g). Notably, on genes 
with binding sites for both MYC and E2F, MGA IPA acts as domi- 
nant-negative regulator of full-length MGA as it significantly induced 
the expression of 5 out of 6 genes in cells that endogenously express full- 
length MGA (Fig. 2e). However, as MGA IPA retains the N-terminal 
T-box, it still acts as a repressor on T-box target genes (Fig. 2e). 

Lastly, the IPA isoform of the transcriptional repressor FOXN3”° 
derepressed its oncogenic targets MYC and PIM2 (Extended Data 
Figs. 3, 6h-j). In summary, the CLL-IPA-generated proteins can con- 
tribute to cancer pathogenesis in various ways. Their generation can 
reduce the expression of functional TSGs (DICER and FOXN3 IPA) 
or they behave as dominant-negatives, thus acting in an oncogenic 
manner (MGA IPA). 

Because all functionally validated CLL-IPAs produced dysfunc- 
tional proteins, we investigated whether this is a general feature. We 
compared the retained fraction of amino acids of IPA isoforms pres- 
ent in normal B cells (B-IPA, n = 2,690) with CLL-IPAs. Although 
the protein size of full-length proteins targeted by IPA was similar, 
CLL-IPAs lose significantly more amino acids than B-IPAs (Fig. 3a, 
Extended Data Fig. 7a). This suggests that IPA in normal cells con- 
tributes to proteome diversity’, whereas CLL-IPAs tend to produce 
dysfunctional proteins. 

Because genes targeted by TR mutations are often TSGs° (Extended 
Data Fig. 7b), we investigated whether TSGs are overrepresented 
among CLL-IPAs. Compared to control groups with matched protein 
sizes, there was a significant enrichment of TSGs among CLL-IPAs 
(P=3 x 107°; Extended Data Fig. 7c-f). Importantly, IPA-generated 
truncated proteins usually lack either more or a comparable num- 
ber of amino acids compared to truncated proteins generated by TR 
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treatment). Shown are the exons that contain primers for amplifications 
of the products. BLCL serve as control cells. The expression of HPRT 
was used as a loading control. c, Genes that are targeted by TR mutations 
in CLL and CLL-IPAs are shown (n = 36). Dark green bars indicate the 
fraction of retained CDRs for each IPA-generated protein. Black dots 
indicate the positions of TR mutations in CLL. CLL-IPAs occur mostly 
in the vicinity of or upstream of TR mutations. P = 0.004, two-sided 
Wilcoxon rank-sum test. Right, the fraction of CLL samples affected is 
shown for each gene and represents the fraction of CLL samples (out of 59) 
with significantly upregulated expression of the IPA isoform (CLL-IPA, 
grey; TR mutations, red). 


mutations, suggesting the IPA isoforms are probably inactive (Extended 
Data Fig. 7c). However, for CLL-IPAs to inactivate TSGs, they must 
also be stably expressed. For 11 out of 12 tested CLL-IPAs, we observed 
stable expression at the mRNA or protein level over a four-year time 
span (Fig. 3b, Extended Data Fig. 5c, d), indicating that they have the 
potential to inactivate TSGs. 

In addition to TSGs in general, we found that genes inactivated by TR 
mutations in CLL are enriched among CLL-IPAs*”* (Fig. 3c, Extended 
Data Fig. 7g). Notably, the fraction of samples affected by CLL-IPA was 
substantially larger than the number of CLL samples affected by TR 
mutations (3.0-85% versus 0.13-2.0%; Fig. 3c, right). This indicates 
that TR mutations and CLL-IPAs target the same genes in different 
patient groups, thus substantially expanding the proportion of patients 
with protein truncations in potential drivers. 

To rule out the possibility that CLL-IPAs are caused by somatic 
mutations, we examined the presence of DNA mutations in the CLL- 
IPA genes. Two genes were targeted by TR mutations and IPA in the 
same patient. Notably, the two inactivation mechanisms are predicted 
to generate different truncated protein products, suggesting that they 
occurred independently? (Extended Data Fig. 7h, i). The mutation data 
also enabled us to associate CLL-IPAs with specific somatic mutations. 
CLL samples with a high number of IPA were enriched in SF3B1 muta- 
tions, but they were independent of IGVH mutational status (Extended 
Data Fig. 7j-l). 

Because of the enrichment of known TSGs among CLL-IPAs, we 
examined whether CLL-IPAs may enable us to identify novel TSGs. 
We selected CLL-IPAs present in at least 20% of CLL samples (n = 199, 
generated from 190 genes; Fig. 4a, Supplementary Tables 1 and 2). We 
next investigated whether these genes are inactivated by TR mutations 
in solid cancers using mutations from more than 86,000 tumours, com- 
piled by the MSK cbio portal*. We observed that 72% of these genes 
are frequently affected by TR mutations in solid tumours and call them 
novel TSG candidates (136 out of 190; Fig. 4b). This is a significant 
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Fig. 4 | Novel TSG candidates are inactivated in CLL at the mRNA level 
and in solid tumours at the DNA level. a, Colour-coded IPA usage for a 
subset of CLL-IPAs (97 out of 199 of samples with significant expression 
of IPA in >20% of CLL samples). Gene names and number of affected 
CLL samples per CLL-IPA are indicated (blue bars, 3’-seq, green bars, 
RNA-seq). b, Truncating mutation rates (number of TR mutations per 
total mutations) in solid tumours, obtained from the MSK cbio portal for 
genes that generate abundant CLL-IPAs, partially shown in a. The bimodal 
distribution was separated at the local minimum (TR mutation per total 
mutations = 0.12, red line) into two gene groups: those rarely targeted 

by TR mutations and those with high TR mutation rates in solid cancers, 
defined as novel TSG candidates. c, TR mutation rates of known and novel 
TSG candidates. **P = 0.0002, two-sided Mann-Whitney U-test. Box 
plots are as in Fig. le. d, As shown in ¢, but for overall mutation rates. 
**P — 1 x 1071, two-sided Mann-Whitney U-test. e, CHST11 protein 
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enrichment over background and the list contains 17 known TSGs and 
119 novel TSG candidates? (Extended Data Fig. 8a, b). Again, CLL- 
IPAs lack more or a comparable number of amino acids as the proteins 
produced by TR mutations, suggesting that CLL-IPAs inactivate the 
functions of these genes (Extended Data Fig. 8a). 

Although the TR mutation rates of the novel TSG candidates were 
comparable with known TSGs found at the lower end of the spectrum, 
their protein size and overall mutation rates were substantially lower 
(Fig. 4c, d, Extended Data Fig. 8c). This may explain why these poten- 
tially cancer-relevant genes have been overlooked thus far?". As they are 
targeted at the mRNA level in leukaemia and at the DNA level in solid 
cancers, they should be considered as a novel class of TSG candidates. 
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models, shown as in Fig. 2b. Loops depict membrane domains. 

A chromosomal translocation in CLL results in fusion of the immunoglobulin 
heavy chain locus (IGH) with a truncated CHST11”*. NT, nucleotide. 

f, Western blot of WNT5B, performed once, shown as in Fig. 2a, from cell 
lysates of or conditioned medium obtained from B cells stably expressing 
green fluorescent protein (GFP), GFP-tagged CHST11 or GFP-tagged 
CHST11 IPA. Conditioned medium from cells expressing CHST11 

IPA contains unglycosylated WNT5B*». Asterisk, unspecific band. 

g, Conditioned medium from samples described in f was added to 
HEK293T cells expressing a WNT reporter, and normalized luciferase 
activity is shown. Data are mean +s.d. from n =7 biologically independent 
experiments. **P =0.002, two-sided Kruskal-Wallis test; P value of 
two-sided Mann-Whitney U-test was adjusted for multiple testing, 
**adjusted P= 0.002. 


To support this, we functionally validated a highly recurrent CLL-IPA 
isoform that affected a poorly known cancer gene. CHST11 encodes a 
Golgi-associated carbohydrate sulfotransferase that modifies chondroi- 
tin on the surface of WNT-expressing cells. The modification results 
in the binding of secreted WNT and prevents its paracrine action”’. 
CHST11 IPA lacks catalytic activity, but retains the cytoplasmic tail” 
(Fig. 4e, Extended Data Fig. 8d). As exclusive expression of the cyto- 
plasmic tail of Golgi enzymes inhibited localization of full-length 
enzymes”, we hypothesized that CHST11 IPA may act in a dominant- 
negative manner. We expressed CHST11 and CHST11 IPA, collected 
the conditioned media, and detected secreted WNT in medium only 
after expressing CHST11 IPA” (Fig. 4f, Extended Data Fig. 8e, f). The 
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conditioned medium activated a WNT reporter in HEK293T cells 
(Fig. 4g), demonstrating that CHST11 IPA enabled paracrine WNT 
action on neighbouring cells through dominant-negative action. Thus, 
in addition to mutations in the WNT pathway”®, CLL-IPAs may also 
contribute to WNT activation in CLL. 

A member of this new class of TSGs was recently found in breast 
cancers, in which tumour-specific expression of MAGI3 IPA generates 
a truncated protein with dominant-negative activity’” (Extended Data 
Fig. 9a). Combined with our findings on T-lineage acute lymphoblastic 
leukaemia (T-ALL), in which we detected more than 100 IPA isoforms 
(Extended Data Fig. 9b), these data indicate that cancer-upregulated 
IPA isoforms are not restricted to CLL. 

In summary, we found that TSGs can be inactivated, either fully or 
partially, by IPA. Even partial loss of TSG function was shown to con- 
tribute crucially to tumorigenesis”*®. As CLL-IPAs are not generated by 
DNA mutations in their corresponding transcription units, DNA and 
mRNA alterations occur in different patient groups. In CLL, the frac- 
tion of patients with TSGs that are inactivated by CLL-IPAs is consid- 
erably larger than those with TSGs disrupted by TR mutations (Fig. 3c); 
thus, CLL-IPAs substantially expand the number of patients with 
affected drivers. Moreover, these data identify a class of TSGs that is 
predominantly inactivated at the mRNA rather than the DNA level’. 
Thus, our study demonstrates that cancer-gained changes in mRNA 
processing can functionally mimic the effects of somatic mutations and 
shows the need to go beyond genomic analyses in cancer diagnostics. 
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Any Methods, including any statements of data availability and Nature Research 
reporting summaries, along with any additional references and Source Data files, 
are available in the online version of the paper at https://doi.org/10.1038/s41586- 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized, and investigators were not blinded to allocation during 
experiments and outcome assessment. 

Samples for 3’-seq and RNA-seq analyses. Samples were obtained from untreated 
patients with CLL seen at Memorial Sloan Kettering Cancer Center, New York 
(Extended Data Table 1a). All patients provided written informed consent before 
participating in the study. The sample collection was approved by the Institutional 
Review Board of Memorial Sloan Kettering Cancer Center. Peripheral blood mon- 
onuclear cells from CLL samples with a minimum white blood cell count of 75,000 
per microlitre were isolated by Ficoll (GE Healthcare) gradient centrifugation at 
400 r.c.f. for 30 min, followed by two washes in PBS at room temperature. Cells were 
treated with red blood cell lysis buffer (155 mM NH,Cl, 12 mM NaHCOs;, 0.1 mM 
EDTA) for 5 min at room temperature and were washed twice with PBS. Pure 
CLL B cells were obtained from peripheral blood mononuclear cells using B-CLL 
isolation kit (Miltenyi Biotec). This selected untouched CLL cells using a cock- 
tail of magnetic beads coated with CD2, CD3, CD4, CD14, CD15, CD16, CD56, 
CD61, CD235a, FceRI and CD34. The purity of CLL B cells (CD5* and CD19*) 
was analysed by FACS and the cells were immediately dissolved in TRI Reagent 
(Ambion) for RNA extraction, followed by 3’-seq or RNA-seq library preparation. 

For longitudinal analyses, samples from two patients were investigated at 
different time points during the course of the disease. CLL11, time point 1 (T1) 
17 months after diagnosis, T2, 24 months after diagnosis, T3, 44 months after 
diagnosis. The patient was not treated with chemotherapy during the sample col- 
lection period. CLL6: T1, 16 months after diagnosis, T2, 49 months, T3, 91 months 
(42 months after chemotherapeutic treatment). 

In addition to the newly generated CLL 3’-seq data, we also used 3’-seq data 
from normal tissues, cell lines and immune cell subsets that we generated previ- 
ously! (Extended Data Table 1b). 

We performed RNA-seq on 11 CLL samples (Extended Data Table 1a) and 
obtained access to a previously published RNA-seq dataset from 44 patients with 
CLL? that was provided by D. A. Landau. RNA-seq data from normal immune cells 
were obtained from samples we generated previously” (Extended Data Table Ic). 
For validation of 3’-seq data, we also used publicly available RNA-seq (tonsil- 
derived NB, GSE45982 (GSM1129340-GSM1129347)”, blood-derived NB, 
ERR431624, ERR431586°°, CD3* T cells, GSM1576415*!) and 3/-seq data’. 

For RNA-seq-based identification of IPA isoforms expressed in T-ALL, we used 
publicly available RNA-seq data from 10 primary T-ALL samples and 2 whole 
human thymus extracts (GSE57982). 

FACS sorting of immune cell populations. Cells were washed with ice-cold PBS 
once, incubated with appropriate fluorochrome-conjugated antibodies for 30 min 
at 4 °C and washed twice with ice-cold PBS containing 0.5% FCS. The following 
antibodies were used: anti-CD3-PE (mouse, BD Biosciences, 555333), anti-CD5- 
FITC (mouse, BD Biosciences, 555352), anti-CD14-PECy7 (mouse, ebioscience, 
25-0149-42), anti-CD19-APC (mouse, BD Biosciences, 555415), anti-CD27-PE 
(mouse BD Biosciences, 555441), anti-CD38-APC (mouse, BD Biosciences, 
555462), anti-CD38-FITC (mouse, BD Biosciences, 555459). Surface protein 
expression was detected by a BD FACSCalibur cell analyser (BD Biosciences) and 
data were analysed using the FlowJo software. 

3’-seq and RNA-seq analyses. 3'-seq libraries were generated as previously 
described and sequenced with Illumina HiSeq using single-end 50-nucleotide 
reads’. RNA-seq libraries were prepared at the Weill Cornell and the MSKCC 
Genomics core facilities. 

Analysis of 3’-seq data was performed as described previously’ with a few 
modifications that have been extensively described’. In brief, a gene is considered 
to be expressed if either the IPA isoform (>5 TPM) or the full-length isoform 
(>5.5 TPM) were expressed in 75% of the samples of a particular cell type. We 
focused our analysis on robustly expressed transcript isoforms and filtered 3’-seq 
peaks according to their usage. Robustly expressed 3’UTR isoforms that are part 
of the atlas are expressed with at least 3 TPM in at least one sample and each peak 
combines at least 10% of all reads that map to the 3/UTR. Robustly expressed 
IPA isoforms that are part of the atlas are expressed with 5 TPM or more and had 
>0.1 IPA site usage in at least one sample. IPA site usage is the relative expression 
of each IPA isoform with respect to the total expression of 3’UTR isoforms (all 
reads that fall into robust 3’UTR peaks are summed up). We only analysed IPA 
isoforms of protein coding genes. 

Validation of IPA isoforms using external data sources. To obtain evidence of IPA 
isoforms from independent methods, we first used RNA-seq data obtained from 
the same RNA or from the same cell type to identify IPA isoforms. We used the 
coordinates of the IPA events obtained from 3’-seq and tested the RNA-seq read 
counts in windows of 100 nucleotides located upstream and downstream of the IPA 
peak using a GLM? (Extended Data Fig. 1a). The windows were separated by 51 
nucleotides centred on the first nucleotide of the polyadenylation signal. Not all IPA 
isoforms could be tested. For example, if the defined windows overlapped with an 


annotated exon, the IPA event was excluded from further analysis. An IPA isoform 
was considered present if we detected a significant difference in read counts within 
the upstream and downstream windows (adjusted P < 0.1) using DESeq. This anal- 
ysis was also used to validate CLL-gained IPA events in an independent CLL dataset. 

We further regarded an IPA isoform as validated if reads that overlap with IPA 

peaks had at least four untemplated adenosines in the RNA-seq data and a poly- 
adenylation signal (or one of its variants)** was detected within 50 nucleotides 
upstream of the read. In addition, we considered IPA isoforms as validated if we 
detected read evidence in independent 3/-seq datasets!®. As no previous 3’-seq 
data exist for many of our cell types, we also included highly expressed (>10 TPM 
and >0.1 IPA site usage) IPA isoforms with an upstream polyadenylation signal 
(AAUAAA and its variants)*? in our downstream analysis. 
Identification of the normal counterpart of CLL and of CLL-IPAs. Hierarchical 
clustering was performed on the normal human B cell subsets derived from lym- 
phoid tissues or peripheral blood and CLL samples using RNA-seq derived mRNA 
expression levels (quantile normalized log, reads per kilobase of transcript per 
million mapped reads (RPKM)). Genes expressed with greater than 5.5 RPKM in 
75% of normal B cells or any of the CLL samples went into the analysis. The 20% 
most variable genes by median absolute deviation across the dataset were used for 
the clustering. The heat map was generated using aheatmap (http://cran.r-project. 
org/package=NMEF) with row scaling. This analysis showed that lymphoid-tissue 
derived CD5* B cells are most closely related in their gene expression profile to 
CLL cells (Extended Data Fig. 2). 

We performed hierarchical unsupervised clustering of CLL and control 
samples based on IPA site usage to test whether IPA site usage separates normal 
and malignant B cells (Extended Data Fig. 1c). The top 20% most variable genes 
by median absolute deviation across all the CD5* B and CLL samples were used. 
This analysis showed two main clusters: Four CLL samples (CLL4, CLL7, CLL11 
and CLL12) clustered separately from the rest of the samples. However, within the 
rest of the samples, the control group (CD5* B) clustered separately. The four CLL 
samples that differed the most from the rest of the samples had a high number of 
significantly upregulated IPA isoforms (CLL high: median number of CLL-IPAs 
per sample, n = 100; range, n = 42-274), whereas the remaining samples had a 
low number of CLL-IPAs (CLL low: median, n= 9; range, n = 5-28; Extended 
Data Fig. le). 

To identify CLL-upregulated IPA isoforms, we applied a GLM!?4 and tested 
usage of each IPA isoform between the normal B cell group and each CLL sample. 
We only considered IPA isoforms that were significantly upregulated in CLL (FDR- 
adjusted P < 0.1, usage difference between CLL and CD5* B > 0.05) and were 
either not or lowly expressed in CD5* B cells (TPM < 8, corresponding to 75% 
quantile for CD5* B TPM). This resulted in 931 significantly upregulated IPA 
events observed in 13 CLL samples. n= 454 IPA events were detected in only a 
single sample and were regarded as non-recurrent, whereas 477 IPA events 
occurred in more than one sample (>2 out of 13), and were considered recurrent 
events by 3’-seq (Extended Data Fig. 1d). The recurrent events resulted in 168 
recurrent CLL-IPA isoforms. 

As CLL-IPAs are detectable by RNA-seq, we used an independent RNA-seq 
dataset containing 46 CLL samples for validation®. We verified up to 71% of testable 
IPAs by this independent method and dataset. Because of the high validation rate, 
we combined the two datasets (n =59 CLL samples) and focused on CLL-IPAs 
present in more than 10% of the whole CLL sample cohort. This resulted in 330 
CLL-upregulated IPA isoforms, derived from 306 genes (Supplementary Table 1). 
The list of 330 CLL upregulated IPA isoforms contains the 168 CLL-IPAs iden- 
tified in at least 2 out of 13 3/-seq samples, but contains also CLL-IPA isoforms 
detected in one 3’-seq and in at least five additional RNA-seq samples (>6 out of 
59 total samples). 

We detected 33 IPA events that showed significantly higher IPA site usage 
in CD5* B cells compared with CLL. IPA site usage was required to be higher 
than in 2 CLL samples (TPM <10, corresponding to 75% quantile for CLL 
TPM; FDR-adjusted P < 0.1, usage difference between CLL and CD5* B > 0.05; 
Supplementary Table 1). 

The fraction of CLL patients affected by IPA or TR mutations shown in Fig. 3c, 
Extended Data Figs. 7c and 8a were calculated as follows: If the CLL-IPA isoform 
was testable by RNA-seq, all 59 CLL samples were considered. If the CLL-IPA 
isoform was not being tested by RNA-seq (because, for example, the upstream exon 
is located too close to the IPA isoform), then only the 13 CLL samples analysed by 
3/-seq were taken into account for calculating the fraction of samples with signif- 
icant expression of the IPA isoform. 

Cell lines. B lymphoblastoid cells (BLCL) are Epstein-Barr virus-immortalized 
human blood B cells'. MEC1 cells are malignant B cells from B-prolymphocytic 
leukaemia and were provided by O.A.-W. Raji and TMD8 cells are malignant B cells 
from lymphomas and were a gift from H.-G. Wendel. HEK293 and HEK293T cells 
(embryonic kidney), HeLa cells (cervical cancer) and A549 cells (lung adenocar- 
cinoma) were purchased from ATCC. Wild-type and DICER-knockout HCT116 
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cells were provided by V. Narry Kim**. BLCL, MEC1 and Raji cells were cultured in 
RPMI with 20% FBS and 1% penicillin-streptomycin. HEK293, HEK293T, HeLa 
and A549 cells were cultured in DMEM with 10% FBS and 1% penicillin-strepto- 
mycin, whereas HCT116 cells were cultured in McCoy’s medium with 10% FBS 
and 1% penicillin-streptomycin. 

Western blotting. Cells were lysed on ice for 30 min with RIPA buffer (50 mM 
Tris pH 7.4, 150 mM NaCl, 1% NP-40, 1% Na-deoxycholate, 1 mM EDTA, 0.05% 
SDS), containing freshly added proteinase inhibitor cocktail (Thermo Scientific). 
For MGA, NUP98, SGK223 and DICER immunobloting, cell lysates were run 
using 3-8% Tris-Acetate NuPAGE gels with Tris-Acetate running buffer (Life 
Technologies). For CARD11, AKAP10, BAZ1B, SENP1, CUL3 and RIPK1, 4-12% 
Bis-Tris NuPAGE gels (Life Technologies) were run with MOPS running buffer 
and all other proteins were run with MES running buffer (Natural Diagnostics). 
The separated proteins were transferred to nitrocellulose membranes (Bio-Rad, 
1620252), blocked with Odyssey Blocking Buffer (Li-Cor, 927-40000) for 1 h at 
room temperature, followed by incubation with primary antibodies at 4 °C over- 
night. After two washes using PBS and 0.1% Tween 20 (PBST), the blots were 
incubated with IRDye-conjugated secondary antibodies for 50 min at room 
temperature. After one wash with PBST and two washes with PBS, proteins were 
detected with Odyssey CLx imaging system (Li-Cor). 

The following primary antibodies were used: anti-actin (mouse, Sigma, A4700; 
rabbit, Sigma, A2066), anti-AKAP10 (mouse, clone 51, Santa Cruz Biotechnology, 
sc-136512), anti-CARD11 (rabbit, Cell Signaling, 4440S), anti-DICER (rabbit, a gift 
from W. Filipowicz), anti-DNM1L (mouse, Abcam, ab56788), anti- MGA (rabbit, 
H-286, Santa Cruz Biotechnology, sc-382569), anti-SFRS15 (SCAF4; mouse, 
Abnova, H00057466-B01), anti-WSTF (BAZ1B; mouse, clone G-5, Santa Cruz 
Biotechnology, sc-514287), anti- NUP98 (rabbit, Novus Biologicals, NB100-93325), 
anti-SGK223 (mouse, Santa Cruz Biotechnology, sc-398164), anti-SENP1 (rabbit, 
Bethyl Labs, A302-927A-T), anti-CUL3 (rabbit, Bethyl Labs, A301-108A-T), 
anti-PAWR (Abcam ab92590), anti-RIPK1 (Cell Signaling 4926), anti-GAPDH 
(goat, V-18, Santa Cruz Biotechnology) and anti- WNT5a/b (rabbit, clone C27E8, 
Cell Signaling 2530). The secondary antibodies used included anti-mouse IRDye 
700 (donkey, Rockland Immunochemicals, 610-730-002), anti-rabbit IRDye 680 
(donkey, Li-Cor Biosciences, 926-68073), anti-rabbit IRDye 800 (donkey, Li-Cor 
Biosciences, 926-32213) and anti-mouse IRDye 800 (donkey, Li-Cor Biosciences, 
926-32212). 

RT-PCR of IPA isoforms. Total RNA was isolated using Tri reagent solution 
(Invitrogen AM9738) and digested with DNase I (Invitrogen AM1906). RNA 
was reverse transcribed using the qScript cDNA SuperMix (Quanta Biosciences 
101414-106). RT-PCR reactions were carried out using purified Taq polymerase 
using a 50 °C annealing temperature and 30 s extension at 72 °C. The linear range 
of amplification was determined by independent PCRs for each primer set. Primers 
were designed to be intron-spanning and are listed in Supplementary Table 3. 
Induction of IPA isoforms. Endogenous U2AF1, U2AF2 and hnRNPC were 
knocked down using pLKO-puro lentiviral vector-based shRNAs (Sigma). Virus 
was produced using the helper plasmids pCMV-VSVG and pCMV-dR8.2 and cells 
were transduced in six-well plates, selected with puromycin (2,1.g ml“) for 5 days 
and then collected for RT-PCR or western blot analysis. 

To induce IPA isoform expression of DICER, an antisense morpholino oligo- 
nucleotide (GeneTools) targeting the 5’ splice site of DICER exon 23 was added 
directly to sub-confluent HeLa cells at the indicated concentrations in the pres- 
ence of 61M EndoPorter-PEG delivery peptide (GeneTools) and harvested at the 
indicated time points. The control morpholino was used at 12|1M concentration. 
Knockdown of CARD11 full-length and IPA isoforms. Isoform-specific shRNA 
primers were cloned into the TRC2-pLKO-GFP plasmid using KpnI and EcoRI. 
Lentivirus was produced as described above and centrifuged at 25,000 r.p.m. for 
1h 45 min at 4 °C (Sorvall WX Ultracentrifuge). Pellets were resuspended and dis- 
solved in cold PBS overnight at 4 °C. The virus titre was estimated by transducing 
wild-type HEK293T cells. The 12-well culture plate was coated overnight with 
5g ml“! fibronectin. TMD8 cells were spin-infected and cultivated for three days, 
followed by western blot analysis of FACS-sorted GFP-positive cells. 

Constructs. The V5-DICER construct was obtained from J. Mendell. To generate the 
DICER-IPA expression plasmid, the DICER-IPA cDNA was amplified from BLCL 
and cloned into the pCK-V5 plasmid using the BamHI and Apal restriction sites. 

The human MGA cDNA (Dharmacon, clone BC136659) was used to PCR- 
amplify the coding region of full-length MGA (8,571 nucleotides plus 6 nucle- 
otides of endogenous Kozak sequence) as well as MGA IPA (3,430 nucleotides 
(end of exon 9) plus GTGAGTATTAA (intronic sequence that will be translated, 
followed by a stop codon; see Extended Data Fig. 6a)). MGA IPA was cloned into 
the pcDNA3.1 expression vector (Life Technologies) using NhelI and Xhol sites. 
GFP fused-MGA IPA was generated by inserting MGA IPA downstream of eGFP 
using the restriction sites BsrGI and Xhol in the pcDNA3.1-GFP vector. MGA 
was cloned into pcDNA3.1-GFP using Gibson Assembly Cloning (New England 
Biolabs) from three pieces. 
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The full-length FOXN3 mRNA was amplified from BLCL cDNA. To obtain 
GFP-FOXN3, it was cloned into pcDNA3.1-GFP* using BsrGI and Xhol restric- 
tion sites. FOXN3 IPA was PCR-amplified from two fragments. Fragment 1 was 
amplified from BLCL cDNA and corresponds to amino acids 1-180, whereas frag- 
ment 2 was amplified from genomic DNA from PBMC and corresponds to the 32 
amino acids generated from intronic sequence, followed by a stop codon. FOXN3 
IPA was fused with GFP at the C terminus as described above. 

Full-length CHST11 was amplified from BLCL cDNA, whereas CHST11 IPA 
was amplified from genomic DNA. Both were fused to GFP at the C terminus as 
described above. The integrity of all constructs was confirmed by sequencing. 
Functional validation of CLL-IPAs. CARD11 IPA. To assess NF-kB activation, 
lentiviral-transduced TMD8 cells (described above) were used. Cells were fixed 
with 4% formaldehyde at room temperature for 15 min. After two washes with 
excess PBS, fixed cells were resuspended with ice-cold PBS and permeablized with 
90% methanol for 20 min on ice. Cells were then washed with cold PBS twice and 
resuspended with the incubation buffer (PBS + 0.5% BSA). Cells were aliquoted 
and incubated with anti-phospho-NF-«B p65 (1:1,500 dilution, Cell Signaling 
3033) for 1.5 h at room temperature. Cells were washed with incubation buffer 
twice and incubated with fluorochrome-conjugated secondary antibody solution 
(1:10,000 Alexa Fluor 647 A27040, Invitrogen) for 15 min at room temperature. 
After two washes with incubation buffer, cells were analysed using a FACS Calibur. 
DICER IPA. Full-length V5-DICER and V5-DICER IPA were immunoprecipitated 
from HEK293T cells as described before’®. In brief, 48 h after transfection, cells 
were washed with cold PBS and lysed with IP buffer (20 mM Tris-HCl pH 8.0, 
150 mM NaCl, 1 mM EDTA, 0.5% NP-40 and 1x EDTA-free protease inhibitor 
(Thermo Fisher)) for 30 min on ice with occasional vortexing. The cell lysate was 
then centrifuged at 20,000g for 10 min at 4 °C and the supernatant was collected. 
The cell lysate was incubated with 31g of anti-V5 tag antibody (Invitrogen R960-25) 
for 30 min on ice, then 900j1g of protein G Dynabeads were added and the reaction 
was rotated for an additional 2 h at 4 °C. After five washes with IP buffer and twice 
in DICER assay buffer (20 mM Tris-HCl pH 8.0, 100 mM KCl, 0.2 mM EDTA), 
90% of beads were resuspended in DICER assay buffer for miRNA cleavage assay 
and the remaining beads were stored in 2x Laemmli sample buffer (Sigma) for 
western blot analysis. 

The miRNA cleavage assay was performed as described previously"®. In brief, 
synthesized pre-miRNA let-7i oligo (Dharmacon) was incubated with immuno- 
precipitated beads prepared as described above in the enzymatic mixture (10 il of 
immunoprecipitated beads in DICER assay buffer, 2 1l of 20 mM MgChy, 0.2 1l of 
0.4|1M pre-miRNA, 0.1 11 of 100 mM DTT, 0.51 of RNaseOUT (Invitrogen) and 
7.2 ul of RNase-free water) at 37°C for 30 min with interval mix. The reaction was 
stopped by chilling samples on ice and analysed by northern blot. To investigate 
whether DICER IPA acts as a dominant-negative version of full-length DICER, 
different ratios of V5-DICER and V5-DICER IPA were mixed and tested with 
respect to miRNA cleavage. 

Reaction mixtures (1011) were added to 10,11 RNA loading buffer (95% for- 
mamide, 0.025% SDS, 0.025% bromophenol blue, 0.025% xylene cyanol FF, 0.5 mM 
EDTA) and denatured at 95°C for 5 min followed by quenching on ice. Samples 
were run on a 15% TBE/Urea gel followed by transfer to a Hybond-N* nylon 
membrane (GE Healthcare RPN303B) using a semi-dry transfer apparatus (Hoefer 
TE70X). After transfer, membranes were briefly dried and then UV cross-linked 
twice with 1,200 uJ cm~? each cycle. Cross-linked membranes were pre-hybridized 
for 1 h at 37°C in ULTRAhyb-Oligo hybridization buffer (Ambion AM8663) in 
a rotary oven. DNA probes against the intended target RNA were synthesized as 
oligos and labelled with +°*P-ATP in the presence of T4 polynucleotide kinase 
(NEB M0201S) for 30 min at 37 °C. Labelled probes were purified through G-25 
microspin columns containing Sephadex resin (GE Healthcare 27-5325-01). 
Membranes were hybridized with labelled probe overnight at 37°C in a rotary 
oven. The next day, membranes were washed twice in 2x SSC/0.1% SDS for 
5 min each at 37 °C followed by one wash in 0.1 x SSC/0.1% SDS for 5 min at 37°C. 
Membranes were exposed to phosphorimager screens and scanned. 

To assess whether expression of DICER IPA influences miRNA expression 
in vivo, endogenous let-7 miRNA expression levels were measured by northern 
blot analysis of total RNA (221g) from wild-type and DICER knockout HCT116 
cells. DICER knockout HCT116 cells were transfected with different amounts of 
V5-DICER and V5-DICER IPA. Cells were harvested 3 days after transfection 
with Lipofectamine 2000 to assess DICER protein expression and corresponding 
endogenous let-7 levels. 

FOXN3 IPA. The fork-head domain of FOXN3 is necessary for transcriptional 
repression of FOXN3 target genes. Thus, truncation of the fork-head domain pre- 
dicts derepression of the target genes. Known target genes are PIM2 and MYC”. 
MECI cells were nucleofected with pcDNA 3.1 vector containing GFP, GFP- 
FOXN3 or GFP-FOXN3 IPA using SF Cell Line 4D-Nucleofector X Kit (Lonza, 
Program FF-120). After 48 h, GFP* cells were FACS sorted, lysed immediately 
(Cells-to-cDNA II Kit, Ambion) and RNA was extracted. cDNA was synthesized 
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by qScript cDNA SuperMix (Quanta Biosciences) and quantitative PCR was per- 
formed using FastStart universal SYBR green master mix (Roche) on a 7900HT 
Fast Real-Time PCR System (Applied Biosystems). The experiment was performed 
from five biologically different replicates. 

MGA IPA. Raji cells were nucleofected with pcDNA3.1 vector containing GFP, 
GFP-MGA or GFP-MGA IPA using Cell Line Nucleofector Kit V (Lonza, Program 
M-013). After 48 h, GFP* Raji cells were FACS-sorted and lysed immediately in 
lysis buffer (Cells-to-cDNA II Kit, Ambion) and RNA was extracted. cDNA syn- 
thesis and qRT-PCR was as described for FOXN3. qRT-PCR was done in technical 
triplicates from three biologically different experiments. MYC target genes were 
previously published**°. E2F-binding sites in MYC target genes were identified 
using the Encode Transcription Factor ChIP-seq track, or they were previously 
described!??*". T-boxes were described for ATF4 and CDKN1B**"?. 

CHST11 IPA. 3'-seq data were used to identify overexpressed WNT proteins in CLL 
cells compared to normal B cells. The expression of WNT was validated in MEC1 
cells by (RT-PCR. WNT5B was the WNT with the highest expression in MEC] cells. 

For WNT detection in media, MECI cells stably expressing GFP, GEP-CHST11 
or GFP-CHST11 IPA were counted and washed once with RPMI without FCS. 
Twenty million cells were cultured in 10 ml RPMI plus 1% pen-strep in one 10-cm 
culture dish. After 18 h, conditioned medium was collected by centrifugation at 
280g for 5 min and passed through a 0.45-\m filter. The supernatant was concen- 
trated by an Amicon Ultra-4 centrifugal filter (Millipore, UFC800324) at 3,000g 
at 10°C for 2 h. The concentrated medium (~50 11) was collected and subjected 
to western blot analysis using anti-WNT5a/b antibody (Cell Signaling 2530). The 
corresponding cell pellets were also collected for western blot analysis. 

To assess paracrine WNT activity in MEC]I cells expressing CHST11 IPA, MEC1 
cells were nucleofected with pcDNA3.1 vector containing GFP, GFP-CHST11 or 
GFP-CHST11 IPA. After 24 h, GFP* cells were FACS sorted and cultivated for 
three days. The conditioned medium was collected and added to HEK293T cells 
which were transiently transfected with a WNT reporter plasmid (Addgene 12456, 
M50, Super 8x TOPFlash) or WNT reporter control plasmid with mutated TCF/ 
LEF binding sites (Addgene 12457, M51, Super 8x TOPFlash mutant)“, The condi- 
tioned medium was added 24 h after transfection. Luciferase activity was measured 
24 h after the addition of conditioned medium using a Glomax 96 Microplate 
Luminometer as described previously. 

Intersection of somatic mutations in CLL with IPA. CLL RNA-seq samples 
(n= 44) with available somatic DNA mutation and prognostic data were available 
to us to map IPA isoform expression. The somatic mutations were obtained using 
exome sequencing that included extended exon boundaries*®. We intersected the 
occurrence of somatic mutations with IPA isoforms in these samples. We focused 
on truncating mutations (nonsense mutations, frame-shift mutations and splice-site 
mutations) in expressed genes as they were likely to have a similar outcome as IPA. 

The IGVH status of CLL samples was assessed at MSKCC for the CLL samples 
studied by 3/-seq. The IGVH status of 44 RNA-seq samples was published’. 
Positions of TR mutations. The positions of TR mutations in CLL were obtained 
from the published CLL somatic mutation datasets*”*. The positions of TR muta- 
tions in solid cancers of TSGs and of genes targeted by CLL-IPAs were obtained 
from the MSK cbio portal (date of reference, 23 February 2018, containing >86,000 
cancer samples with 97% derived from solid tumours)*. The position with the high- 
est number of TR mutations was used (hot spot) and is indicated by the symbol. 
The symbol is lacking if the genes had TR mutations without a hot spot. 
Number of amino acids of full-length or IPA-generated truncated proteins. To 
calculate the number of amino acids of full-length proteins, we used the longest 
Ref-seq annotated mRNA isoform, obtained the number of coding nucleotides 
and divided this number by three to obtain the total number of amino acids. To 
calculate the number of amino acids of the IPA-generated truncated proteins we 
counted the number of nucleotides from the start codon to the end of the exon 
located upstream of the IPA isoform and divided this number by three to obtain the 
number of retained amino acids. This number also provided information about the 
reading frame of the protein at the exon/intron junction located upstream of the 
IPA isoform. We then used the correct reading frame and translated the intronic 
nucleotides until an in-frame stop codon was detected. The amino acids translated 
from intronic sequence were added to the retained amino acids to obtain the size 
of the IPA-generated truncated proteins. 

The fraction of retained CDR is the number of amino acids retained (up to the end 
of the exon located upstream of the IPA isoform) divided by the number of amino 
acids calculated from the longest mRNA isoform encoding the full-length protein. 
Identification of known and novel TSGs. For known TSGs, we used the 301 
TSGs reported by Davoli et al.° that were expressed in CLL samples. Davoli used 
a computational method (TUSON Explorer) to predict 301 TSGs from genomic 
sequencing data obtained from more than 8,200 cancers (>90% are derived from 
solid tumours). 

For novel TSGs, we used the data from the MSK cbio portal (see above). It was 
previously reported that the variable with the highest predictive power for TSGs 


was the proportion of TR mutations to all mutations°. We calculated this ratio for 
the 190 genes that generated CLL-IPAs in more than 20% of samples and identified 
a bimodal distribution with a separation point at 12% TR mutations to all muta- 
tions. The genes that generated CLL-IPAs in more than 20% of samples and had a 
TR mutation rate > 12% in the data from MSK cbio portal were called novel TSG 
candidates (Supplementary Table 2). 

To assess whether known TSGs are enriched among CLL-IPAs a x? test was 
performed. To exclude that this association occurred by chance, five control lists 
containing genes with similar coding region length and expression were generated 
and tested for enrichment of TSGs. 

Others statistical methods. To perform enrichment statistics, we used a x? test and 
calculated the P value using a two-sided Fisher's exact test. To assess the functional 
differences between full-length proteins and IPA-generated truncated proteins 
(MGA and FOXN3), we used a two-sided t-test for independent samples. When 
comparing three groups (CARD11 and CHST11), a two-sided Kruskal-Wallis test 
was used. For subsequent pair-wise comparisons, a two-sided Mann-Whitney 
U-test was applied and the P values were adjusted with Bonferroni multiple testing 
correction. For all other tests that assessed the differences of features between two 
groups, we used a two-sided Mann-Whitney U-test. To investigate the spatial rela- 
tionship between the IPA-generated truncated proteins and hot spot TR mutations, 
we performed a two-sided Wilcoxon rank-sum test. 

Reporting summary. Further information on research design is available in the 
Nature Research Reporting Summary linked to this paper. 

Data availability. All 3’-seq and RNA-seq data generated and analysed for this 
study have been deposited in the Gene Expression Omnibus (GEO) database under 
accession numbers GSE111310 and GSE111793. The code to analyse the data are 
available at https://bitbucket.org/leslielab/apa_2018/ and the processed data are 
available in Supplementary Table 1 (for Figs. 1b-d, 2a, 4a, Extended Data Figs. 3 
and 4) and Supplementary Table 2 (for Extended Data Fig. 8a), and in the Source 
Data files (for Figs. le, 2c, e, 3a, c, 4b-d, g, Extended Data Figs. 2c, 6j, 7c and 8a). 
Data on DNA mutations from patients with CLL were provided by D. A. Landau 
and need to be requested from him. The mutation data on solid cancers were 
obtained through the MSK cbio portal. The data can be accessed at http://www. 
cbioportal.org. 
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Extended Data Fig. 1 | Validation of IPA isoforms by independent 
methods and identification of CLL-IPAs used for further analysis. 

a, RNA-seq data were used to validate the presence of IPA isoforms using 
a GLM. Within two 100-nucleotide windows (green bars) separated by 51 
nucleotides and located up- and downstream of the IPA peak, the RNA-seq 
reads were counted. The IPA peak was considered validated if adjusted 

P < 0.1 (see Methods). Out of n =5,587 tested IPA isoforms, n= 1,662 
were validated by this method. Shown is MGA as a representative example. 
b, As only a fraction of IPA isoforms were validated by the method from 

a, additional methods were used to obtain independent evidence for the 
presence of the IPA isoforms. Independent evidence was obtained using 
untemplated adenosines from RNA-seq data or through the presence of 
the IPA isoform in other 3’-seq protocols’®. As the majority of immune 
cell types used in this study have not been investigated using other 3/- 

seq protocols and IPA isoform expression is cell type-specific’, highly 
expressed IPA isoforms (>10 TPM) were not excluded from further 
analysis even if no read evidence was found by other protocols. 

c, Hierarchical clustering based on IPA site usage separates the 3/-seq 
dataset into four groups. It separates CD5* B from CLL samples and 


clusters CLL samples into three different groups. Shown is the usage 
difference of the 20% most variable IPA isoforms across the dataset 

(n= 342). Four out of thirteen CLL samples cluster away from the rest 

of the samples and are characterized by a high number of IPA isoforms 
(CLL high). d, The GLM (FDR-adjusted P < 0.1, IPA usage difference 

> 0.05, IPA isoform expressed in CD5* B < 8 TPM) identified 477 
recurrent (significantly upregulated in at least 2 out of 13 CLL samples 

by 3/-seq) and 454 non-recurrent (significantly upregulated in 1 out of 13 
CLL samples by 3’-seq). IPAs were validated in an independent RNA-seq 
dataset containing 46 new CLL samples. Among the recurrent IPAs, 71% 
of testable IPAs were verified using another GLM (see a). Among the non- 
recurrent IPAs, 64% of testable IPAs were verified. e, Plotting the number 
of CLL-IPAs per sample separates the CLL samples investigated by 3’-seq 
into two groups: 4 out of 13 samples generate a high number of CLL- 

IPAs (CLL high, median of CLL-IPAs/sample, n = 100, range, 42-274), 
whereas the rest of the samples generate lower numbers (CLL low, median, 
n=9, range, 5-28). Centre bar denotes the median; error bars denote the 
interquartile range. **P = 0.003, two-sided Mann-Whitney U-test. 
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Extended Data Fig. 3 | The 3’-seq and RNA-seq tracks of functionally 
validated CLL-IPAs. Five CLL-IPAs were functionally validated. Their 
3’-seq and RNA-seq tracks are shown here and in Fig. 2a. Data are shown 
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as in Fig. 1b. The corresponding RT-PCRs are shown in Extended Data 
Fig. 5a. 
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Extended Data Fig. 5 | Validation of the IPA-generated truncated 
mRNAs and validation of their stable expression over time. a, Detection 
of full-length and IPA-generated truncated mRNAs by RT-PCR in 
normal B cells (CD5* B, BLCL) and CLL cells used in the western blot 
validations shown in Fig. 2a and Extended Data Fig. 4. All experiments 
were performed twice with similar results. Primers to amplify the mRNA 
isoforms are located in the first and last exons shown in the gene models 
and are listed in Supplementary Table 3. HPRT was used as loading 
control. b, Induction of truncated mRNAs and proteins through shRNA- 
mediated knockdown of splicing factors. All experiments were performed 
twice with similar results. U2AF1 was knocked down in HeLa cells, 
U2AF2 was knocked down in HEK293 cells and hnRNPC was knocked 
down in A549 cells. Shown as in a, except for NUP96, which is shown 

as in Extended Data Fig. 4. NUP96 is derived from NUP98 precursor. 
Induction of DICER] IPA by transfection of increasing amounts of anti- 
sense morpholinos (MO) directed against the 5’ splice site of intron 23 

of DICER1 in HeLa cells. Shown are RT-PCRs. c, RT-PCRs, performed 
once, on expression of full-length and IPA isoforms for eight CLL-IPAs in 


ACTIN ——- —— es 


samples from two patients with CLL and control B cells (CD5* B, BLCL). 
The samples were collected over a time interval of over 6 years. CLL11: 
T1, 17 months after diagnosis, T2, 24 months, T3, 44 months; CLL6: T1, 
16 months, T2, 49 months, T3, 91 months (42 months after treatment). 
Samples from all time points (except CLL6, T3) were obtained from 
untreated patients. The primers for amplifications of the products were 
located in the first and last exons shown in the gene models and are listed 
in Supplementary Table 3. Expression of HPRT serves as loading control. 
The same gel picture of HPRT is shown in Fig. 3b for CLL samples and in 
a, far right panel, for BLCL and CD5* control samples. All tested CLL-IPA 
isoforms were detectable at several time points during the course of the 
disease. Compared with CD5* B cells, expression of FCHSD2 IPA was 
not significantly upregulated in CLL. d, Western blots of full-length and 
IPA-generated truncated proteins from CARD11, DICER and SCAF4. All 
experiments were performed twice with similar results. Actin was used as 
loading control. Shown are samples from normal B cells (BLCL) and two 
patients with CLL, both at two different time points 0.5-10 months apart. 
For gel source data, see Supplementary Fig. 1. 
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Extended Data Fig. 6 | IPA-generated truncated proteins resemble 

the protein products of truncating DNA mutations and have cancer- 
promoting properties. a, CARD 11 IPA results in translation of intronic 
nucleotides (grey) until an in-frame stop codon is encountered. This 
results in the generation of 16 new amino acids (grey) downstream of 
exon 10. In the case of MGA IPA, three new amino acids downstream of 
exon 9 are generated. b, Western blot showing that TMD8 cells express 
similar amounts of CARD11 IPA as CLL samples. The western blot is 
shown as in Fig. 2a and was performed twice. Actin was used as loading 
control. c, Western blot (as in b) showing full-length CARD 11 as well as 
CARD11 IPA in TMD8 cells expressing a control shRNA (Co), an shRNA 
that exclusively knocks down the full-length protein and two different 
shRNAs that exclusively knock down the CARD11 IPA isoform. The 
experiment was performed twice with similar results. GAPDH was used as 
loading control. d, Endogenous phospho-NF-kB p65 levels were measured 
by FACS in TMD8 cells expressing the indicated shRNAs from c. Mean 
fluorescent intensity values are shown in parentheses in FACS plots of a 
representative experiment out of three. e, Immunoprecipitation of V5- 
DICER or V5-DICER IPA from HEK293T cells using an anti-V5 antibody. 
The experiment was performed twice with similar results. 2.5% of input 
was loaded. f, The extent of miRNA processing depends on the expression 
levels of full-length DICER, but not IPA. Shown are wild-type (WT) and 
DICER knockout (KO) HCT116 cells. Re-expression of different amounts 
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Us SESE —106 
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of FOXN3 IPA / FOXN3 
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of full-length DICER] protein in the knockout cells (measured by western 
blot of DICER] in the top panel) results in different levels of endogenous 
let-7 expression (measured by northern blot in the bottom panel; compare 
lanes 3 and 4). Expression of DICER IPA has no influence on miRNA 
processing (compare lanes 4 and 5). Actin and U6 were used as loading 
controls. The experiment was performed twice with similar results. 

g, Western blot of MGA. MGA and MGA IPA were cloned and expressed 
in HEK293T cells to confirm the predicted protein size. The experiment 
was performed twice with similar results. Shown is also the endogenous 
MGA expression in Raji cells. Actin was used as loading control on the 
same blot. Asterisk denotes an unspecific band. h, Protein models of full- 
length and FOXN3 IPA are shown as in Fig. 2b. The IPA-generated protein 
truncates the fork-head domain and is predicted to lose the repressive 
activity. i, As in a, but for FOXN3. FOXN3 IPA generates 32 new amino 
acids downstream of exon 2. j, FOXN3 IPA significantly derepresses 
expression of the oncogenic targets MYC and PIM2. Fold change in 
mRNA level of endogenous genes in MEC1 B cells after transfection of 
GFP-FOXN3 IPA compared with transfection of full-length GFP-FOXN3. 
HPRT-normalized values are shown as box plots (as in Fig. le) from 

n=5 biologically independent experiments, each performed in technical 
triplicates. ** P= 0.002, two-sided t-test for independent samples. For gel 
source data, see Supplementary Fig. 1. 
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Extended Data Fig. 7 | Inactivation of TSGs by CLL-IPAs independently 
of DNA mutations. a, The distribution of full-length protein size of 

genes that generate CLL-IPAs (n = 306) and B-IPAs (n= 2,690) is shown 
in amino acids. Box plots are as in Fig. le. P=0.87, two-sided Mann- 
Whitney U-test. b, TR rate (ratio of TR mutations compared to total 
mutations) is shown for known TSGs obtained previously°. Box plots are 
as in Fig. le. P=1 x 107}°9, two-sided Mann-Whitney U-test. c, Known 
TSGs, obtained previously” that are targeted by CLL-IPAs (n = 21) are 
shown. Dark green bars indicate the fraction of retained CDRs for each 
IPA-generated protein. Black dots indicate the hot spot positions of 

TR mutations obtained from MSK cbio portal. CLL-IPAs mostly occur 
upstream or within 10% (of overall amino acid length) of the mutations. 
P=0.04, two-sided Wilcoxon rank-sum test. d, Contingency table for 
enrichment of TSGs among genes that generate CLL-IPAs. P value 

was obtained from two-sided Fisher’s exact test. TSGs were obtained 
previously®. e, TSGs and genes that generate CLL-IPA isoforms have 
longer CDRs than genes that do not generate IPA isoforms. Box plots are 
as in Fig. le. P=1 x 10~*®°, two-sided Kruskal-Wallis test. f, Five control 
gene lists (n = 306, each) with a similar size distribution as CLL-IPAs 

and expressed in CLL were tested for enrichment of TSGs. Shown is the 
number of TSGs found. A x? test did not show a significant enrichment of 
TSGs among the control genes. g, Contingency table for enrichment of TR 
mutation genes in CLL among genes that generate CLL-IPAs. P value was 
obtained from two-sided Fisher’s exact test. h, ZMYM5 is truncated by a 
TR mutation and an IPA isoform in the same patient, but the aberrations 


are predicted to result in different truncated proteins. A 10-bp deletion 

in exon 3 results in a frameshift leading to the generation of a truncated 
ZMYM5 protein, whereas ZMYMS IPA (not yet annotated) produces a 
truncated protein containing 352 more amino acids in the same patient. 
The genes shown in h and iare the only genes with simultaneous presence 
of a TR mutation and CLL-IPA out of n = 268 tested. The position of the 
TR mutation is indicated in green. CLL7 and CLL11 3/-seq and RNA-seq 
tracks are shown for comparison reasons. i, MGA is truncated by a TR 
mutation and an IPA isoform in the same patient. The TR mutation affects 
the 5’ splice site of intron 7, thus generating two additional amino acids 
downstream of exon 7, whereas the IPA isoform encodes a truncated 
MGA protein containing three more amino acids downstream of exon 9. 
Mutation and 3’-seq analysis were performed once. CLL7 and CLL11 are 
shown for comparison reasons. j, Shown are additional recurrent (1 > 1) 
DNA mutations found by exome sequencing of CLL patient samples 
stratified by a high or low number of CLL-IPAs per patient. Only the top 
and bottom 16 samples with high or low CLL-IPAs are shown to normalize 
the number of samples analysed. This analysis is only descriptive and 

no test was performed. k, Significant enrichment of SF3B1 mutations in 
the group of CLL samples with abundant CLL-IPA isoforms. Two-sided 
Mann-Whitney U-test was performed. 1, Abundance of CLL-IPAs is not 
associated with IGVH mutational status. Shown is the number of CLL- 
IPAs per sample for patients with mutated (MUT, n= 30) or unmutated 
(UN, n= 21) IGVH genes. Box plots are as in Fig. le. P= 0.4, two-sided 
Mann-Whitney U-test. 
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Extended Data Fig. 8 | Novel TSG candidates and validation of CHST11 
IPA as cancer-promoting isoform. a, As in Fig. 3c, but shown are known 
(red gene names) and novel TSG candidates (black gene names) among 
the abundant CLL-IPAs. CLL-IPAs seem to inactivate these genes as they 
mostly occur upstream or within 10% (of overall amino acid length) of the 
mutations. P=1 x 1078, two-sided Wilcoxon rank-sum test performed on 
all 136 TSGs; P=1 x 10-8, two-sided Wilcoxon rank-sum test performed 
on the novel TSGs, n= 119. Position of the TR mutation was determined 
using the data obtained from the MSK cbio portal and indicates the hot 
spot mutation. Right, the fraction of CLL samples affected represents 

the fraction of CLL samples (out of 59) with significant expression of the 
IPA isoform. Genes were included if they were affected in at least 20% of 
samples investigated either by 3'-seq or RNA-seq. b, Contingency table for 
enrichment of novel TSGs among highly recurrent CLL-IPAs. P value was 
obtained from two-sided Fisher's exact test. c, TSGs have larger protein 
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sizes. Box plots are as in Fig. le. **P= 0.005, two-sided Mann-Whitney 
U-test. The increased overall mutation rate of known TSGs correlates 
with larger protein size. P=1 x 10~°, Spearman's correlation coefficient, 
r=0.74. d, CHST11 IPA generates 18 new amino acids (grey) downstream 
of exon 1. e, Experimental set-up to measure paracrine WNT activity 
produced by MECI1 B cells either expressing GFP, GFP-CHST11 or 
GFP-CHST11 IPA and using a WNT reporter expressed in HEK293T 
cells. Primary CLL cells and the CLL cell line MEC1 express several 
WNTs, including WNT5B. In the presence of CHST11 WNT (red dots) 
binds to sulfated proteins on the surface of WNT producing cells, whereas 
WNT is secreted into the medium in the presence of CHST11 IPA. WNT- 
conditioned medium activates a WNT reporter in HEK293T cells. This 
set-up refers to Fig. 4f, g. f, Western blot, performed once, for WNT5 
shown as in Fig. 4f, but including HeLa cells as positive control for WNT5 
expression. Actin was used as loading control on the same blot. 
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a, CLL sample characteristics. b, Normal human immune cells investigated by 3’-seq. c, Normal human immune cells investigated by RNA-seq. 
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The NORAD IncRNA assembles a topoisomerase 
complex critical for genome stability 


Mathias Munschauer'*, Celina T. N suyen!, Klara Sirokman!, Christina R. Hartigan!, Larson Hogstrom!|, Jesse M. Engreitz', 
Jacob C. Ulirsch!?, Charles P. Fulco!, Vidya Subramanian!, Jenny Chen!*, Monica Schenonel, Mitchell Guttman’, 


Steven A. Carr! & Eric S. Lander’®7* 


The human genome contains thousands of long non-coding RNAs’, 
but specific biological functions and biochemical mechanisms 
have been discovered for only about a dozen”~’. A specific long 
non-coding RNA—non-coding RNA activated by DNA damage 
(NORAD) —has recently been shown to be required for maintaining 
genomic stability®, but its molecular mechanism is unknown. Here 
we combine RNA antisense purification and quantitative mass 
spectrometry to identify proteins that directly interact with NORAD 
in living cells. We show that NORAD interacts with proteins involved 
in DNA replication and repair in steady-state cells and localizes 
to the nucleus upon stimulation with replication stress or DNA 
damage. In particular, NORAD interacts with RBMX, a component 
of the DNA-damage response, and contains the strongest RBMX- 
binding site in the transcriptome. We demonstrate that NORAD 
controls the ability of RBMX to assemble a ribonucleoprotein 
complex—which we term NORAD-activated ribonucleoprotein 
complex 1 (NARC1)—that contains the known suppressors of 
genomic instability topoisomerase I (TOP1), ALYREF and the 
PRPF19-CDC5L complex. Cells depleted for NORAD or RBMX 
display an increased frequency of chromosome segregation 
defects, reduced replication-fork velocity and altered cell-cycle 
progression—which represent phenotypes that are mechanistically 
linked to TOP1 and PRPF19-CDC5L function. Expression of 
NORAD in trans can rescue defects caused by NORAD depletion, 
but rescue is significantly impaired when the RBMX-binding site 
in NORAD is deleted. Our results demonstrate that the interaction 
between NORAD and RBM<X is important for NORAD function, and 
that NORAD is required for the assembly of the previously unknown 
topoisomerase complex NARC1, which contributes to maintaining 
genomic stability. In addition, we uncover a previously unknown 
function for long non-coding RNAs in modulating the ability of an 
RNA-binding protein to assemble a higher-order ribonucleoprotein 
complex. 

NORAD stands out among long non-coding RNAs (IncRNAs) 
because it (1) is highly conserved relative to other IncRNAs, (2) is 
abundantly expressed in many cell types, (3) is upregulated upon DNA 
damage and (4) induces chromosomal instability and aneuploidy when 
deleted. This phenotype is intriguing as little is known about the roles 
of IncRNAs in maintaining a stable genome. A model for IncRNA 
function suggests that IncRNAs can serve as assembly scaffolds for 
ribonucleoprotein complexes®’, yet this model has been explored in 
only a few cases. The mechanism that connects the NORAD IncRNA 
to chromosomal instability remains unknown. 

Two recent studies have reported PUMILIO, a highly abundant cyto- 
plasmic RNA-binding protein with no known role in genomic stability, 
as the sole NORAD- interacting protein®’. However, these results were 
obtained from in vitro mixing of exogenous NORAD fragments with 


cytoplasmic extracts, which may not accurately represent the protein 
contacts of NORAD in living cells (Supplementary Note 1). 

To reveal the direct interactions of NORAD with proteins in live 
cells, we captured and identified NORAD-interacting proteins by 
combining RNA antisense purification (RAP) with quantitative liquid 
chromatography—mass spectrometry using isobaric mass tag quantifi- 
cation (RAP MS) (Fig. 1a). HCT116 colon carcinoma cells were treated 
with 365-nm light after 4-thiouridine labelling!®, which covalently 
crosslinks proteins to RNA but not to other proteins. IncRNA- 
protein complexes were purified by RNA hybrid selection with 
antisense oligonucleotides that target NORAD, under denaturing 
and reducing conditions at high temperature to minimize the co- 
purification of indirectly bound proteins? (Fig. 1a). To identify specific 
interactors with NORAD, we quantitatively compared the resulting pro- 
teins to those captured in purifications with antisense oligonucleotides 
that target the well-characterized “RNA component of mitochondrial 
RNA processing endoribonuclease’ (RMRP), which is not expected to 
interact with the same proteins as NORAD ''. We analysed biological 
replicate purifications in a single 4-plex iTRAQ cassette, quantifying 
1,361 proteins that each had more than two unique peptides (Fig. 1b). 
The control purification captured about 85% of RMRP transcripts 
(Extended Data Fig. 1) and enriched the target RNA approximately 
550-fold versus input RNA. We found 12 strongly enriched proteins 
(mean log,(iTRAQ ratio (NORAD/RMRP)) < —1.6, P< 0.05, moder- 
ated t-test) (Fig. 1c), including 8 of the 10 known core components of 
the RMRP complex!! and one previously identified candidate rRNA- 
and/or tRNA-processing factor’. 

We then analysed NORAD antisense purifications. Experiments 
captured 82% of endogenous NORAD (Extended Data Fig. 1) (about 
80-fold enrichment versus input RNA). We reproducibly identified 
45 proteins that met our enrichment criteria (mean log,(iTRAQ ratio 
(NORAD/RMRP)) > 1.6, P< 0.05, moderated t-test) (Fig. 1b). This set 
of proteins is highly specific to NORAD, in that 41 out of the 45 proteins 
(Fig. 1c) were not among 219 promiscuous binders (Supplementary 
Note 3). The RNA-binding protein PUMILIO2 (PUM2) was indeed 
present in our NORAD interactome, but it ranked 185th out of 
the 265 proteins we detected (mean log2(iTRAQ ratio (NORAD/ 
RMRP)) > 0.5) and did not meet our cut-off for strongly enriched 
proteins. 

Notably, many of the 41 NORAD-interacting proteins have key roles 
in nuclear processes such as DNA unwinding, replication and repair 
(including PURA, PURB, TAF15, ALYREE, SFPQ, SRSF1, RBM14, 
DDX17, RBMX and its retrogene RBMXL1). Twenty-nine (71%) of 
the forty-one proteins localize to the nucleus, nucleoplasm or chro- 
matin, whereas only two (5%) localize exclusively to the cytoplasm 
(Fig. 1d). The interactome thus points towards an important nuclear 
function of NORAD. 
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Biology, Harvard Medical School, Boston, MA, USA. *e-mail: mathias@broadinstitute.org; eric@broadinstitute.org 


132 | NATURE | VOL 561 | 6 SEPTEMBER 2018 


© 2018 Springer Nature Limited. All rights reserved. 


= 
g 
= 
n 


a Human 365 nm 
cell 


Ultraviolet crosslink @- aahieutiaiae 


Cell lysis, or 

antisense- Biotin 

capture ent 
aay 

Denaturing { 


purification BL sre 
ue Cong 
‘ 


= 
juantification 6 


Intensity 
] Q eer 
log, (iTRAQ ratio (NORAD/RMAP) replicate 2) 


Identification 


Protein elution 
by benzonase 


Trypsin 
digest 
a ee .e 


wogg 29s —> e % 


rep 1 rep1 


C46 ’ 
RMPRP enriched proteins 
iTRAQ (RMRP/NORAD) 10 


* NORAD enriched proteins 
iTRAQ (NORAD/RMRFP) 


iTRAQ ratio 


Fig. 1 | NORAD directly binds many nuclear proteins in living cells. 

a, Schematic overview of RAP MS. b, Quantification of NORAD- and 
RMRP-interacting proteins. Scatter plot of log)-transformed iTRAQ ratios 
from two biological replicates is shown. Adjusted P value, two-tailed 
moderated t-test (Supplementary Note 2). Rep, replicate. c, iTRAQ ratios 


Given the overrepresentation of nuclear proteins, we used single- 
molecule RNA fluorescent in situ hybridization (smRNA FISH) to 
assess the subcellular localization of NORAD in intact cells. In contrast 
to previous reports that characterized NORAD as being located exclu- 
sively in the cytoplasm’, we found that on average 40-50% of NORAD 
transcripts in HCT116 cells reside in the nucleus (Fig. 2a and Extended 
Data Fig. 2a, b). We confirmed the nuclear localization by subcellu- 
lar fractionation and quantitative PCR with reverse transcription 
(RI-qPCR) (Extended Data Fig. 2c, d). Notably, when cells were chal- 
lenged with DNA damage and replication stress, NORAD was upregulated 
(Extended Data Fig. 2e) and its nuclear localization increased markedly 
(to about 85%), whereas the localization patterns of control RNAs were 
unaffected (Fig. 2a, b). Given this shift in localization, we performed 
RAP experiments with and without DNA damage to confirm that the 
interactions of NORAD with several candidate binders also occur under 
conditions of DNA damage (Extended Data Fig. 2f, g). 

Among the NORAD-interacting proteins, we focused on RBMX, the 
knockdown phenotype of which (impaired DNA damage repair!’ and 
premature sister-chromatid separation") is closely related to the previ- 
ously reported NORAD knockout phenotype’®. To explore this connec- 
tion, we quantified the frequency of chromosome-segregation defects 
in response to depletion of NORAD or RBMX by imaging mitotic cells. 
We achieved >90% reduction in NORAD expression (estimated by 
RT-qPCR, RNA-sequencing and smRNA FISH) by CRISPR interfer- 
ence (KRAB-dCas9) targeted to the NORAD promoter (Fig. 2a and 
Extended Data Fig. 3a, b). For both wild-type and knockdown cells, 
we imaged 100 anaphase nuclei and calculated the frequency of DAPI- 
positive anaphase bridges. Consistent with previous reports’, NORAD 
depletion caused a significant increase (2.2-fold) in segregation defects 
(Fig. 2c, d). Importantly, these defects were rescued by expression of 
full-length NORAD in trans (Fig. 2d), indicating that the defects are 
dependent on the NORAD RNA. Depletion of RBMX (Extended Data 
Fig. 3a) caused a comparable increase (2.6-fold) in the frequency of 
anaphase bridges (Fig. 2d). By contrast, depletion of the cytoplasmic 
protein PUM2 (Extended Data Fig. 3a) caused no substantial increase 
in segregation defects (Extended Data Fig. 3c). We reasoned that the 
interaction between NORAD and RBMX may hold important mecha- 
nistic insights into NORAD function. 

To explore this interaction, we mapped RBMX-binding sites on 
NORAD by crosslinking and immunoprecipitation (CLIP). We cova- 
lently coupled proteins to RNA using ultraviolet crosslinking'® and 
immunopurified RBMX with a specific antibody. We isolated and 
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of RMRP- (blue) and NORAD- (red) enriched proteins. Columns represent 
the mean of two biological replicate experiments, individual data points 
are shown (Supplementary Tables 1, 2). d, Subcellular localization of 
NORAD-interacting proteins. 


sequenced RNA crosslinked to RBMX. RBMX displays unusually 
strong and specific binding to the 5’ end of NORAD (Fig. 2e). The 
RBMxX-binding site in NORAD extends over more than 800 nucleotides 
and covers about 15% of NORAD—making it eight times larger than 
the majority of RBMX-binding sites (Extended Data Fig. 3d) and the 
strongest RBMX-binding region in the transcriptome (Fig. 2f). This 
unusual binding pattern suggests that NORAD serves as a high-affinity 
binding target for RBMX and contains many RBMX-binding sites. A 
multiple sequence alignment of NORAD transcripts, which was assem- 
bled de novo from RNA-sequencing data from 11 mammalian species 
(Extended Data Fig. 3e), suggests that the RBMX-binding region in 
NORAD is transcribed and conserved throughout mammalian evo- 
lution. Next, we performed CLIP for three additional RNA-binding 
proteins and showed that the RBMX-binding region does not bind 
PUMILIO, FUBP1 or FUBP3 (Extended Data Fig. 3f). 

To confirm that the NORAD-RBMX interaction occurs in the 
nucleus, we performed RBMX RNA immunoprecipitation (RIP) in 
nuclear and cytoplasmic extracts and showed that over 99% of the total 
RBMX RIP signal is indeed nuclear (Extended Data Fig. 3g). Consistent 
with this result, immunofluorescence microscopy suggests that RBMX 
localizes exclusively to the nucleus (Extended Data Fig. 3h). Finally, 
depletion of RBMX did not affect subcellular localization of NORAD 
(Extended Data Fig. 3i). 

We speculated that NORAD might use its large RBMX-binding site to 
assemble a ribonucleoprotein complex. To examine the role of NORAD 
in such a complex, we sought to identify proteins that bind RBMX and 
determine whether their interaction with RBMX was dependent on 
NORAD. We performed co-immunoprecipitation and mass spectro- 
metry (co-IP MS) experiments and compared the quantitative enrich- 
ment of RBMX-interacting proteins in cells with and without NORAD 
knockdown (Fig. 3a). Importantly, we used a nonspecific RNA and 
DNA nuclease (benzonase) to ensure that RBMX-binding is direct, 
rather than being mediated by RNA. 

Among the top 11 proteins that bound to RBMX only in the presence 
of NORAD, 7 are linked to DNA replication or repair (Fig. 3b). 

Six of these proteins (TOP1, TOPIMT, PRPF19, CDC5L, BCAS2 
and MEPCE) were not detected in NORAD RAP MS data or were not 
among the top 200 enriched proteins, which suggests that they bind 
directly to RBMX and do not interact strongly with NORAD (Extended 
Data Fig. 4a). We further confirmed by western blot the absence of 
TOP1 in NORAD antisense purifications (Extended Data Fig. 4b) and 
showed that levels of TOP1, RBMX, PRPF19 and CDCS5L proteins 
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were not changed upon NORAD depletion (Extended Data Fig. 4c). 
PRPF19, CDC5L and BCAS2, together with PLRG1, make up the core 
of the human PRPF19-CDC5L complex, and both PRPF19-CDC5L 
and TOP! have important roles in DNA replication and genomic stabil- 
ity, as previously reviewed'>'©. TOP1 suppresses genome instability by 
preventing interference between replication and transcription!’. This 
involves relieving torsional stress in DNA (that is, supercoiling) and 
suppressing the accumulation of RNA-DNA hybrids (R-loops)'®. Both 
R-loops and supercoiled DNA impair replication-fork progression and 
can lead to genomic instability!”””. Stalled replication forks activate the 
DNA-damage response though ATR signalling. CDCSL binds and acti- 
vates ATR”®, and the E3 ligase PRPF19 ubiquitylates RPA, enhancing 
ATRIP-ATR recruitment to stalled replications forks”). 

The precise roles of the remaining two NORAD-dependent RBMX 
interactors in maintaining genomic stability are less well understood. 
MEPCE binds to the 5’ cap of 7SK”? and was reported in several 
studies that aimed to identify proteins involved in the DNA dam- 
age response!*; however, its exact function in this process remains 
unknown. Unlike the six proteins above that were not found by RAP 
to interact strongly with NORAD, a seventh protein—ALYREF—was 
identified as a strong NORAD binder. ALYREF is part of the human 
TREX complex and interacts with the 5’ end of many RNAs, including 
NORAD (Extended Data Fig. 4d), to facilitate RNA export from the 
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Fig. 3 | NORAD modulates RBMX and is essential for assembly of a 
topoisomerase complex. a, Illustration of RBMX co-IP MS experiments. 
b, Quantification of RBMX-interacting proteins in wild-type versus 
NORAD knockdown cells. Mean log, TMT ratios from two biological 
replicates are shown. Adjusted P value, two-tailed moderated t-test 
(Supplementary Note 2). Purple, NORAD-dependent RBMX interactors 
functionally linked to DNA replication and repair. Inset, illustration of 
PRPF19-CDCS5L complex. c, Western blot of Flag-RBMX-V5 co-IP 
followed by size-exclusion chromatography. Fractions 4-6 were pooled 
for mass spectrometry (Supplementary Table 5). Fraction 6 and fraction 
20 were used for RNA sequencing. Western blots are representative of 
one experiment; three independent experiments were performed. SEC, 
size-exclusion chromatography. d, Proximity ligation assay for RBMX and 
TOP1. Two different antibody pairs were used. Centre lines, medians; box 
limits, 25th and 75th percentiles; whiskers, 5th and 95th percentiles; dots, 
outliers. ****P < 0.0001, ***P < 0.001, NS, not significant, two-tailed 
Mann-Whitney U test. Sample size antibody pair 1: RBMX WT, n= 102; 
RBMX KD, n= 193; NORAD WT, n=521; NORAD KD, n=559; rescue 
full length, n = 176; rescue 5P(5’ RBMX-binding site) deletion, n = 171. 
Sample size antibody pair 2: NORAD WT, n= 209; NORAD KD, n= 232; 
rescue full length m= 211; rescue 5P deletion, n = 290. Representative 
images are shown. Scale bar, 20,1m. 


nucleus”*, ALYREF contributes to genomic stability by suppressing 
R-loops™, as does TOP1. 

We performed reciprocal co-IP and western blots to confirm that 
TOP1, ALYREF and CDC5L interact with RBMX and also contact each 
other (Extended Data Fig. 4e), suggesting that these proteins may con- 
stitute a complex. To test whether such a complex exists, we generated 
cell lines that express epitope-tagged RBMX and performed co-IP 
experiments (using benzonase to digest unprotected RNA and DNA) 
followed by native elution and size-exclusion chromatography. Western 
blot analysis of size-fractionated co-IP samples showed that RBMX, 
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Fig. 4 | Depletion of NORAD and NARC1 components affects 
replication-fork velocity and cell-cycle progression. a, Replication-fork 
velocity measured by DNA combing (Supplementary Tables 8-10). Centre 
lines, medians; box limits, 25th and 75th percentiles; whiskers, 5th and 
95th percentiles; dots, outliers. ****P < 0.0001, two-tailed Mann-Whitney 
Utest. Sample size: NORAD WT, n= 280; NORAD KD, n= 428; RBMX 
WT, n= 438; RBMX KD, n= 296. b, Cell-cycle analysis by flow cytometry. 
CRISPR interference was used to deplete RBMX and NORAD (n=4; 


TOP1 and PRPF19 are part of a 700-1,000-kDa complex (Fig. 3c and 
Extended Data Fig. 4f). The majority of TOP1 in this complex displays 
an approximately 50-kDa size shift, consistent with a known SUMO-1 
modification of TOP1 proteins that are associated with transcription- 
ally active or replicating chromatin’’. Mass spectrometry confirmed 
that—in addition to RBMX, TOP1 and PRPF19—RBMXLI, which 
is encoded by an RBMX retrogene, is a component of this complex 
(Fig. 3c). Finally, we speculated that this complex protects NORAD 
from benzonase digestion. We constructed sequencing libraries using 
RNA extracted from various size-exclusion chromatography fractions. 
Notably, RNA footprints that matched the previously identified RBMX- 
binding site in NORAD were present only in fractions that contained 
the complex (Fig. 3c). These data demonstrate that NORAD is a 
physical part of the captured complex. 

We next used proximity ligation assays to show that the RBMX- 
TOP1 interaction occurs in the nucleus, is disrupted by NORAD 
depletion and is rescued by re-introducing full-length NORAD 
into NORAD-depleted cells (Fig. 3d and Extended Data Fig. 4g). 
Importantly, rescue is strongly impaired if the rescue construct lacks 
the RBMX-binding region (Fig. 3d and Extended Data Fig. 4g). 

Our results indicate that NORAD modulates the ability of RBMX to 
interact with other proteins that appear not to bind NORAD directly— 
namely, TOP1 and the core PRPF19-CDC5L complex. Given the 
distinct molecular composition of this NORAD-dependent RBMX 
complex and the functional importance of its components, we name it 
NORAD.-activated ribonucleoprotein complex 1 (NARC1). 

Many NARCI1 components have prominent roles in maintaining 
genomic stability. Although individual components such as RBMX or 
PRPF19 have been reported to contribute to mRNA splicing!" we did 
not observe global changes in mRNA splicing upon NORAD depletion 
in HCT116 cells (Extended Data Fig. 5a). In other cell types, RBMX 
and CDC5L can influence the expression of BRCA2 and BRCA1'*”°. 
However, BRCA1 and BRCA2 were not among differentially expressed 
genes (Supplementary Table 7) and their proteins levels were not 
noticeably different in NORAD-depleted and normal HCT116 cells 
(Extended Data Fig. 4c). 

Given the essential role of NORAD in assembling NARC1, we spec- 
ulated that NORAD depletion may cause a TOP1-related phenotype 
and directly or indirectly alter DNA replication, which can lead to chro- 
mosome segregation defects and genomic instability’””*. To assess the 
functional consequence of NORAD depletion on replication, we used 
the DNA combing technique and measured replication-fork velocity 
at the single-molecule level. Analysis of over 250 replication forks in 
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Supplementary Table 11). Fluorescence-activated cell sorting histograms 
are shown in Extended Data Fig. 5e. c, As in b, but for TOP1. RNA 
interference was used to deplete TOP1 (n =6; Supplementary Table 12). 
Fluorescence-activated cell sorting histograms are shown in Extended 
Data Fig. 5f. Values are mean + 95% confidence interval. ****P < 0.0001, 
*** P< 0).001, *P < 0.05, NS, not significant, two-tailed Welch's t-test. 

d, Model illustrating NORAD function. 


wild-type and knockdown cells confirmed that NORAD and RBMX 
depletion significantly reduced replication-fork velocity (Fig. 4a); the 
observed effect-size was comparable to previously published TOP1 
knockdown data!*. Thus, NORAD may directly or indirectly affect 
DNA replication even in the absence of additional DNA damage 
stimulus. 

We tested whether NORAD depletion also affects cell-cycle progres- 
sion. We labelled newly synthesized DNA with 5-ethynyl-2’-deoxyuridine 
(EdU) and measured EdU incorporation and total DNA content by 
fluorescence-activated cell sorting. We observed a clear decrease in 
S phase accompanied by increased G1 phase in NORAD-, RBMX- 
and TOP1-depleted cells (Fig. 4b, c and Extended Data Fig. 5b-f). 
Consistent with these findings, impaired replication-fork progression 
has been linked to chromosome mis-segregation”® (as observed in 
NORAD and RBMX knockdown cells), which in turn can trigger a 
cell-cycle arrest in the subsequent G1 phase”’. Importantly, a G1 arrest 
alone cannot explain the reduction in replication-fork velocity observed 
above. We next examined whether the NORAD-RBM<X interaction 
is important for this effect on cell-cycle progression. Expression of 
full-length NORAD in trans was sufficient to rescue cell-cycle defects 
in NORAD-knockdown cells (Fig. 4b and Extended Data Fig. 5b, e). 
By contrast, a NORAD construct that lacks the RBMX-binding site 
decreased S phase and increased G2/M phase; this contrasts with 
NORAD knockdown and may point towards an altered molecular 
function of truncated NORAD (Fig. 4b and Extended Data Fig. 5b, e). 
Deletion of the RBMX-binding site may therefore act as a dominant 
negative alteration, which indicates that the RBMX-binding region is 
required for correct NORAD function. 

Our results link the known function of members of the NARC1 
complex (particularly TOP1) in preventing replication stress and 
genome instability’®!*!*?7 to the role of NORAD in suppressing ane- 
uploidy. Importantly, we demonstrate that the RBMX-binding region 
in NORAD contributes to NORAD function, presumably by promoting 
NARCI assembly. 

It has widely been suggested that IncRNAs participate in assembling 
groups of proteins, but IncRNA-protein complexes have been fully 
characterized for only a few IncRNAs; these include XIST’, TERC’, 
NEATI1*, MALATI1° and HOTAIR®. Our results demonstrate that 
NORAD is essential for the assembly of the ribonucleoprotein com- 
plex NARCI1, which physically links proteins known to be involved in 
DNA replication or repair but not known to act together. We suggest a 
model in which depletion of NORAD or deletion of its RBMX-binding 
site disrupts NARCI1, which alters replication-fork velocity and impairs 
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cell-cycle progression. It is tempting to speculate that altered DNA 
replication causes cells to accumulate the observed chromosome segre- 
gation defects, a known cause of genomic instability and aneuploidy*®”? 
(Fig. 4d). While our data demonstrates a central role of NARC1 in the 
NORAD phenotype, other proteins or complexes may contribute to 
additional aspects of NORAD function. 

The precise mechanism or mechanisms by which NORAD promotes 
NARC1 assembly remain to be elucidated but might include (1) induc- 
ing a conformational change in RBMX, (2) recruiting a large number 
of RBMX molecules to its 5’ end to create a protein interaction scaffold 
or (3) using other direct interactions to bring NARC1 members into 
close proximity. RBMX encodes a large low-complexity domain that 
can self-assemble and undergo phase separation in vitro*°. Binding 
of RBMX to NORAD may nucleate the formation of higher-order 
RBMX assemblies that facilitate binding of other proteins that contain 
a low-complexity domain. 

In addition to these structural features, NORAD has unusual func- 
tional features in that NORAD localization to the nucleus can be trig- 
gered by DNA damage, which may allow cells to rapidly assemble 
NARC1 or to re-localize pre-assembled complexes without the need 
for additional protein synthesis. 


Online content 

Any methods, additional references, Nature Research reporting summaries, source 
data, statements of data availability and associated accession codes are available at 
https://doi.org/10. 1038/s41586-018-0453-z. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized and investigators were not blinded to allocation during 
experiments and outcome assessment. 

Tissue culture. We maintained HCT116 cells (ATCC) in McCoy’s 5A 
(Thermo Fisher Scientific) with 10% heat-inactivated FBS (HIFBS, Thermo 
Fisher Scientific), 1 mM sodium pyruvate, 2mM L-glutamine, and 100 units/ml 
streptomycin and 100 mg/ml penicillin. Cells were grown at 37°C and 5% CO, 
atmosphere. 

Lentivirus production. We plated 700,000 HEK293T cells in 6-well tissue cul- 
ture plates and grew them for 24 h before transfecting with 1 \1g dVPR, 300 ng 
VSVG, and 1.2\.g transfer plasmid using FuGene HD (Promega). Sixteen hours 
after transfection we changed the medium to DMEM with 20% HIFBS. At 48 h 
post-transfection, we collected viral supernatants and filtered them through a 
0.45 tM syringe filter before use. 

Generation of CRISPR interference cell lines. We generated inducible CRISPR 
interference (CRISPRi) cell lines by transducing HCT116 cells with a construct 
expressing rtTA linked by IRES to a neomycin resistance cassette expressed from 
an EFla promoter (ClonTech) and selecting with 200 1g/ml G418 (Thermo Fisher 
Scientific). Next, r'TA-expressing HCT116 cells were transduced with a previously 
described KRAB-dCas9 construct linked by IRES to BFP?!. We selected for cells 
expressing BFP by fluorescence-activated cell sorting. Inducible NORAD, RBMX 
and PUM2 knockdown cell lines were generated by transducing stable CRISPRi 
lines with sgRNAs (expressed from a previously described sgOpti backbone?!) and 
selecting with 1 j1g/ml puromycin. 

RAP MS. To capture endogenous NORAD transcripts, we designed and synthe- 
sized 5’ biotinylated 90-mer DNA oligonucleotides (Integrated DNA Technologies) 
antisense to the target RNA sequence. We used 26 probes that covered the entire 
NORAD sequence, with the exception of regions that matched to other transcripts 
or genomic regions as previously described**. For NORAD and RMRP antisense 
purifications we grew 500 million HCT116 cells per RNA. We supplemented cell 
culture medium with a final concentration of 200|1M 4-thiouridine and grew cells 
for 8 h before crosslinking. Cells were washed once with PBS and then crosslinked 
on ice using 0.8 J/cm? of 365-nm ultraviolet light in a Stratalinker (Stratagene). 
Cells were then scraped from culture dishes, washed once with PBS, pelleted by 
centrifugation at 500g for 5 min and flash-frozen in liquid nitrogen for storage at 
—80°C. Preparation of total cell lysates was performed as previously described’. 
For antisense purification of crosslinked protein-RNA complexes we included the 
following modifications to the previously described procedure: all buffers were 
pre-heated to 55°C. We used 501g pooled antisense probes for 500 million lysed 
cells. For pre-clear of lysates and capture of RNA/DNA hybrids we used 5 ml 
streptavidin Dynabeads MyOne C1 magnetic beads (Thermo Fisher Scientific) for 
500 million cells. Elution of captured proteins from streptavidin beads was achieved 
by digesting nucleic acids using 250 U of benzonase (Millipore), 25 U RNase A 
and 1000 U RNase T1 (Thermo Fisher Scientific) for 8 h at 37°C. Trichloroacetic 
acid-precipitated proteins were reconstituted in 8 M urea and 50 mM Tris-HCl 
pH 7.8 and stored at —20°C until processing for mass spectrometry. 

Protein digestion for RAP MS. RAP-captured proteins were resuspended 
in 40 11 of digestion buffer (8 M urea, 50 mM Tris-HCl pH 7.8), reduced (1 il 
of 160 mM DTT, 30 min, room temperature) and alkylated (1.6 l of 250 mM 
IAA, 45 min, dark, room temperature), followed by a 2 h Lys-c digestion (0.1 pg 
per sample) at room temperature. Next, the samples were diluted with 1201 of 
100 mM Tris-HCl pH 7.8 to a final concentration of <2 M urea, and 0.5 1g of 
trypsin was added for overnight digestion at room temperature with agitation. 
Samples were quenched with 8.5 il of formic acid and desalted on 4-punch STAGE- 
Tips as previously described*. 

iTRAQ labelling of peptides and BRP fractionation for RAP MS. Desalted 
peptides were labelled with iTRAQ4* reagent according to the manufacturer’s 
instructions (AB Sciex). Peptides were dissolved in 30,11 of 50 mM triethyl- 
amonium bicarbonate (TEAB) pH 8.5 and labelling reagent was added in 70 1l of 
ethanol. Samples were incubated with labelling reagent for 1 h with agitation, and 
the reaction was quenched with 5 1l of 1 M Tris-HCl pH 7.8. Differentially labelled 
peptides were subsequently mixed and prepared for BRP fractionation on 50 mg 
SepPak columns according to the following protocol: cartridges were prepared for 
desalting by equilibrating with methanol, 50% acetonitrile (ACN), 1% formic acid 
and 3 washes with 0.1% TFA. Samples were loaded on to the cartridge and washed 
3 times with 1% formic acid. A pH switch was performed before the collection 
of BRP fractions with 5 mM ammonium formate at pH 10. BRP fractions were 
collected at the following ACN concentrations: 10% ACN in 5 mM ammonium 
formate; 15% ACN in 5 mM ammonium formate; 20% ACN in 5 mM ammonium 
formate; 30% ACN in 5 mM ammonium formate; 40% ACN in 5 mM ammonium 
formate; and 50% ACN in 5 mM ammonium formate. 
Co-immunoprecipitation and MS. To capture RBMX-interacting proteins, we 
grew 15 million inducible CRISPRi cells with stably integrated NORAD sgRNAs 
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for each immunoprecipitation experiment. For NORAD depletion samples, 
we induced knockdown by supplementing cell culture medium with 0.5 1g/ml 
doxycycline for 72 h, while NORAD wild-type samples were grown without doxy- 
cycline. Cells were washed in PBS, trypsinized and collected by centrifugation. 
Cell pellets were washed twice with ice-cold PBS and cell numbers were counted 
and normalized between knockdown and wild-type samples. Fresh cell pellets 
containing 15 million cells were lysed in 375 11 co-IP lysis buffer (50 mM Tris-HCl 
pH 7.5, 150 mM NaCl, 1% NP40, 0.1% sodium deoxycholate, and Halt Protease and 
Phosphatase Inhibitor Cocktail (Thermo Fisher Scientific)). Lysates were incubated 
on ice for 30 min and mixed by pipetting every 5-10 min to enhance nuclear lysis. 
Lysates were cleared by centrifugation at 14,000g for 10 min at 4°C and insoluble 
material was removed. We pre-cleared lysates by incubating with 501] protein A 
magnetic beads (Thermo Fisher Scientific) for 30 min at 4°C. Meanwhile, 900 ng 
RBMX antibody (Cell Signaling #14794) was pre-coupled to 50,11 protein A beads 
for 45 min at room temperature. We determined the total protein concentration 
in pre-cleared lysates by BCA assay in triplicates and normalized all samples to 
contain exactly 2.5 mg total protein. To non-specifically digest all DNA and RNA, 
we added 50 U benzonase and 1 mM MgCl, to all lysates. Free RBMX antibody was 
removed from magnetic beads and benzonase-treated lysates were added to beads 
and incubated overnight at 4°C. The next day, supernatant was removed and beads 
were washed twice in 50 mM Tris-HCl pH 7.5, 150 mM NaCl and 0.05% NP40, 
followed by two washes in 50 mM Tris-HCl pH 7.5 and 150 mM NaCl. After the 
last wash, beads were overlaid with 10,11 PBS and immediately subjected to sample 
preparation for mass spectrometry and TMT labelling. 

On-bead protein digestion for co-IP MS. Following immunoprecipitation, 
washed beads were resuspended in 90 1 of digestion buffer (2 M urea, 50 mM 
Tris-HCl pH 7.8, 2 mM DTT, 0.005 1g/ml sequencing-grade trypsin) and incu- 
bated for 1 h with agitation at 700 rpm. The supernatant was removed and placed 
in a fresh tube. Beads were washed two times with 60 of 2 M urea in 150 mM 
Tris-HCl pH 7.8, and washes were combined with the supernatant. This proce- 
dure was repeated twice to ensure complete removal of proteins from the beads. 
Supernatants were combined and proteins were reduced (3.5 il of 500 mM DTT, 
30 min, room temperature) and alkylated (911 of IAA, 45 min, room temperature, 
dark), before digestion with 4 1g of trypsin overnight at room temperature with 
agitation. Samples were acidified (1% formic acid) and desalted on Waters 10 mg 
Oasis HLB cartridges. 

TMT labelling of peptides and BRP Fractionation for co-IP MS. Desalted 
peptides were labelled with TMT6 reagent according to the manufacturer's 
instructions (Thermo Fisher Scientific). Peptides were dissolved in 2511 of HEPES 
pH 8.5 and 0.2 mg of TMT labelling reagent was added to each sample in 10 1l of 
ACN. Samples were incubated with labelling reagent for 1 h with agitation. Next, 
the reaction was quenched with 211 of 5% hydroxylamine. Differentially labelled 
peptides were subsequently mixed and prepared for BRP fractionation on 50 mg 
SepPak columns according to the following protocol: cartridges were prepared for 
desalting by equilibrating with methanol, 50% ACN, 1% formic acid and 3 washes 
with 0.1% TFA. Samples were loaded on to the cartridge and washed 3 times with 
1% formic acid. A pH switch was performed with 5 mM ammonium formate at 
pH 10, collected and run as fraction 1. Subsequent fractions were collected at the 
following ACN concentrations: 10% ACN in 5mM ammonium formate; 15% ACN 
in 5 mM ammonium formate; 20% ACN in 5 mM ammonium formate; 30% ACN 
in 5mM ammonium formate; 40% ACN in 5 mM ammonium formate; and 50% 
ACN in 5 mM ammonium formate. 

LC-MS/MS Analysis (RAP MS and co-IP MS). Reconstituted peptides were 
injected onto a capillary column (Picofrit with 10-{1m tip opening, 75-j1m diameter, 
New Objective) packed in-house with 20 cm C18 silica material (1.9 1m ReproSil- 
Pur C18-AQ medium, Dr Maisch GmbH), and separated on an online nanoflow 
EASY-nLC 1000 UHPLC system (Thermo Fisher Scientific). Columns were heated 
to 50°C in column heater sleeves (Phoenix-ST) to reduce back-pressure during 
the gradient. 

RAP MS experiments. Peptides were separated at a flow rate of 200 nl/min with 
a linear 120-min gradient from 100% solvent A (3% ACN, 0.1% formic acid) to 
35% solvent B (90% ACN, 0.1% formic acid) for 82 min, followed by a 3-min linear 
increase from 35 to 90% B with a 5-min hold at 60% B before increasing to 90% 
B for 3 min and holding for 20 min, and equilibrating back at 50% B for 10 min 
to end the gradient. 

Co-IP MS experiments. Peptides in each BRP fraction were separated at a flow rate 
of 200 nl/min over a linear gradient of 100% A to 20% B for 28 min, with a linear 
increase from 20% B to 60% B for 16 min, and a hold at 90% B for 5 min before 
returning to 50% B. 

Peptides were analysed on an Orbitrap Q Exactive Plus mass spectro- 
meter (Thermo Fisher Scientific) operated in data-dependent mode. Higher- 
energy collision dissociation tandem mass spectrometry (HCD MS/MS) scans 
(resolution = 17,500 for i{TRAQ and TMT methods) were taken after each MS1 
scan (resolution = 70,000) on the top 12 most abundant ions using an AGC target of 
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3 x 10° ions for MS1 and 5 x 10* ions for MS2. The isolation widths for MS/MS ions 
were 1.6 for iTRAQ and TMT methods. The maximum ion fill-time for MS/MS 
scans was 120 ms, the HCD-normalized collision energy was 29; dynamic exclu- 
sion time was set to 20 s, and peptide match and isotope exclusion functions were 
enabled. 

Quantification and identification of peptides and proteins (RAP MS and co-IP MS). 
All mass spectra were processed using the Spectrum Mill software package 
v.6.01 pre-release (Agilent Technologies), which includes modules developed for 
iTRAQ and TMT6-based quantification. Precursor ion quantification was done 
using extracted ion chromatograms for each precursor ion. The peak area for 
the extracted ion chromatogram of each precursor ion subjected to MS/MS was 
calculated in the intervening high-resolution MS1 scans of the LC-MS/MS runs 
using narrow windows around each individual member of the isotope cluster. 
Peak widths in both time and m/z domains were dynamically determined on the 
basis of mass spectrometry scan resolution, precursor charge and m/z, subject to 
quality metrics on the relative distribution of the peaks in the isotope cluster versus 
theoretical. Similar MS/MS spectra acquired on the same precursor m/z in the 
same dissociation mode with + 60 s were merged. MS/MS spectra with precursor 
charge >7 and poor quality MS/MS spectra, which failed the quality filter by having 
a sequence tag length less than 1, were excluded from searching. 

For peptide identification, MS/MS spectra were searched against the human 
Uniprot database to which a set of common laboratory contaminant proteins was 
appended. Search parameters included: ESI-QEXACTIVE-HCD scoring para- 
meters, trypsin or Lys-c/trypsin enzyme specificity with a maximum of 2 missed 
cleavage, 40% minimum matched peak intensity, + 20 ppm precursor mass 
tolerance, + 20 ppm product mass tolerance, and carbamidomethylation of cysteins 
and isobaric labelling of lysines and N-termini as fixed modifications in the RAP 
MS (iTRAQ) and the immunoprecipitation mass spectrometry (TMT6) experi- 
ments with no fixed modification on lysines or N-termini for the size-exclusion 
chromatography experiment. Oxidation of methionine, N-terminal acetylation 
and deamidated (N) were allowed as variable modifications, with a precursor MH* 
shift range from —18 to 64 Da. Identities interpreted for individual spectra were 
automatically designated as valid by optimizing score and delta rank1-rank2 score 
thresholds separately for each precursor charge state in each LC-MS/MS run, while 
allowing a maximum target-decoy-based false-discovery rate (FDR) of 1.0% at 
the spectrum level. 

In calculating scores at the protein level and reporting the identified proteins, 
redundancy is addressed in the following manner: the protein score is the sum 
of the scores of distinct peptides. A distinct peptide is the single highest scoring 
instance of a peptide detected through an MS/MS spectrum. MS/MS spectra for 
a particular peptide may have been recorded multiple times (that is, from differ- 
ent precursor charge states, isolated from adjacent BRP fractions or modified by 
oxidation of Met), but are still counted as a single distinct peptide. When a pep- 
tide sequence over eight residues long is contained in multiple protein entries in 
the sequence database, the proteins are grouped together and the highest scoring 
one and its accession number are reported. In some cases in which the protein 
sequences are grouped in this manner, there are distinct peptides that uniquely 
represent a lower scoring member of the group (isoforms or family members). 
Each of these instances spawns a subgroup, and multiple subgroups are reported 
and counted towards the total number of proteins identified. /TRAQ and TMT 
ratios were obtained from the protein comparisons export table in Spectrum Mill. 
To obtain iTRAQ or TMT protein ratios, the median was calculated over all of the 
distinct peptides assigned to a protein subgroup in each replicate. For RAP MS 
we required each protein to be detected with more than two unique peptides. To 
enable precise quantification, we limited our analysis to peptides that are uniquely 
assigned to a specific protein isoform or family member. For co-IP MS we required 
more than four unique peptides. For statistical analysis, we used the Limma pack- 
age* in R (https://www.r-project.org/) to calculate multiple comparison adjusted 
P values using a moderated t-test. 

Native RBMX co-IP and size-excusion chromatograpy. To capture native RBMX 
complexes, we generated stable HCT116 cell lines that express Flag~-RBMX-V5. 
For co-IP and size-exclusion chromatography experiments, we grew 200 million 
cells. Cells were collected by scraping culture dishes, washed once with PBS and 
pelleted by centrifugation at 500g for 5 min. Fresh cell pellets were lysed in 8 ml 
co-IP lysis buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl, 1% NP40, 0.1% sodium 
deoxycholate, and Halt Protease and Phosphatase Inhibitor Cocktail (Thermo 
Fisher Scientific). Lysates were incubated on ice for 30 min and mixed by pipetting 
every 5-10 min to enhance nuclear lysis. Lysates were cleared by centrifugation 
at 14,000g for 10 min at 4°C and insoluble material was removed. We pre-cleared 
lysates by incubating with 2.5 ml protein G magnetic beads (Thermo Fisher 
Scientific) for 30 min at 4°C. Meanwhile, 600 1g Flag M2 antibody (Sigma # F1804) 
was pre-coupled to 2.5 ml protein G beads for 45 min at room temperature. To 
non-specifically digest DNA and RNA, we added 500 U benzonase to the cell lysate. 
Free Flag M2 antibody was removed from magnetic beads and benzonase-treated 


lysates were added to beads and incubated overnight at 4°C. The next day, super- 
natant was removed and beads were washed twice in 50 mM Tris-HCl pH 7.5, 
150 mM NaCl and 0.05% NP40, followed by two washes in 50 mM Tris-HCl 
pH7.5 and 150 mM NaCl. After the last wash, protein complexes were eluted using 
250 1g Flag-peptide in 50011 of 150 mM NaCl, 25 mM Tris pH 7.5, 0.05% IGEPAL. 
Elutions were incubated 1 h at 4°C with agitation. Eluates were separated from 
beads and filtered using a 0.2-11m membrane filter. Size-exclusion chromatography 
of the RBMX complex was performed using a Superose 6 Increase 10/300 column 
(GE Healthcare) equilibrated in 150 mM NaCl, 25 mM Tris pH 7.5, 0.05% IGEPAL. 
We injected 40011 of the eluate onto the column at a flow rate of 0.4 ml/min and 
collected 0.5-ml fractions. Two hundred and fifty microliters of each fraction was 
subjected to trichloroacetic acid-precipitation to concentrate proteins. Protein 
content was analysed by western blotting and mass spectrometry. 

For mass spectrometry analysis, proteins were reduced, alkylated and denatured 
at 90°C for 5 min, spun down and loaded separated by SDS-PAGE. The gel was 
run in 1x MES SDS-PAGE running buffer at 175 V for 40 min, after which it was 
stained for 2 h in SimplyBlue Safe Stain (Thermo Fisher Scientific) and destained 
in water overnight. The gel lane was cut into 4 fractions, diced and destained with 
50% ACN, 50% 100mM ammonium bicarbonate. Destaining buffer was removed 
and gel pieces were dehydrated with 30011 of ACN. ACN was aspirated once the 
gel pieces were white. One-and-a-half micrograms of trypsin was added to each 
of the 4 fractions in 10011 of 100 mM ammonium bicarbonate (pH 8) and incu- 
bated overnight at 37°C. The supernatant from each fraction was collected into 
in a fresh tube, and the peptides were extracted from the gel pieces by washing 
twice with 60% ACN, 0.1% formic acid and collecting the extract in the tube with 
the initial supernatant. Finally, gel pieces were dehydrated with ACN, which was 
collected with the rest of the extract. Fractions were then dried using a Speedvac 
concentrator, reconstituted in 3% ACN and 0.1% formic acid, and desalted on 
C18 Stage Tips**. Eluate from each fraction was transferred to HPLC vials, dried 
down and reconstituted in 5 1l of 3% ACN, 5% formic acid and run on an EasyNLC 
1200 coupled to an Orbitrap Q Exactive Plus mass spectrometer. The previously 
described method for co-IP MS experiments (see ‘Co-IP MS experiments’ above) 
was used for analysis, with the only difference being a normalized collision energy 
of 25, which is routinely used for label-free peptide analysis. 

To extract RNA from size-exclusion chromatography fractions containing the 
protein complex as well as control fractions we Trizol-extracted the remaining 
25011 of sample, and isolated RNA using Direct-zol columns (Zymo Research). 
We removed rRNA with the NEBNext rRNA Depletion Kit (New England Biolabs) 
by following the manufacturer's instructions. Finally, we constructed RNA- 
sequencing libraries using the SMARTer smRNA-Seq Kit (Clontech) by following 
the manufacturer's instructions. Libraries were sequenced on an Illumina HiSeq 
2500 instrument to an average read depth of 15-20 million reads with 50-bp read 
1 and 60-bp read 2. We trimmed 5 bp from the beginning of read 1 and 15 bp from 
the beginning of read 2 before mapping. Reads aligning to rRNA were removed 
from downstream analysis*°. Reads were then mapped to hg19 using Bowtie2. 
Mapping results were restricted to the single best alignment found for any given 
read. Discordant alignments of paired-end reads were excluded from analysis. 
Data normalization was performed by scaling coverage values by (1,000,000/total 
mapped read count). 

CLIP. The CLIP protocol below is extensively based on three previously published 
CLIP methods: irCLIP*’, PAR-CLIP** and eCLIP”. 

We constructed the pre-adenylated irCLIP adaptor as previously described*”. 
All other oligonucleotides were synthesized as described in the irCLIP 
protocol, with the exception of reverse transcription primers. We replaced 
ethyleneglycol spacers with three deoxyuridines and modified the 5’ end 
of reverse transcription primers to reflect the nucleotide preference of 
CircLigase II (general structure: /5phos/RNNNNN-6nt-barcode-NNNN 
NTACCCTTCGCTTCACACACAAG/ideoxyU//ideoxyU//ideoxyU/TACTGAAC 
CGC). 

For each CLIP experiment we grew 20 million HCT116 cells in medium supple- 
mented with 200M 4-thiouridine for 8 h. Cells were washed once with PBS and 
then crosslinked on ice using 0.2 J/cm? of 365-nm ultraviolet light in a Stratalinker. 
Cells were then scraped from culture dishes, washed once with PBS, pelleted by 
centrifugation at 500g for 5 min and flash-frozen in liquid nitrogen for storage at 
—80°C. To prepare cell lysates, pellets were thawed on ice and resuspended in NP40 
lysis buffer (50 mM HEPES pH 7.5, 150 mM KCl, 2 mM EDTA, 1% (v/v) NP40, 
0.25 mM DTT, complete EDTA-free protease inhibitor cocktail) and incubated 
on ice for 10 min. We sonicated cell lysates using a Branson Digital Sonifier with 
a microtip set at 5 W power for a total of 1 min 30 s in intermittent pulses (0.7-s 
on, 2.3-s off), followed by RNase I (Thermo Fisher Scientific) digestion (0.5 U/l, 
10 min at 23°C). Subsequently, we added 15 11/ml Murine RNase Inhibitor 
(New England Biolabs), followed by DNA digestion (20 U TURBO DNase (2 U/l; 
Thermo Fisher Scientific), 2.5 mM MgCl, and 0.5 mM CaCl) for 20 min at 37°C. 
We incubated samples on ice for 10 min before clearing lysates by centrifugation at 
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15,000g for 15 min. Insoluble material was removed and total protein concentration 
was determined by BCA assay. Cell lysates were flash-frozen and stored in batches 
of 10 mg total protein at —80°C. 

For each immunoprecipitation experiment, lysates (10 mg total protein) were 
thawed on ice and pre-cleared by incubating with protein A/G magnetic beads (using 
30 11/mg total protein) for 30 min at 4°C. In the meantime, antibodies (6 .g/mg 
total protein) were coupled to protein A/G magnetic beads (using 30 11/mg total 
protein) at room temperature for 45 min (antibodies used: RBMX, Cell Signaling 
#14794; ALYREF, Bethyl # A302-892A; PUMI, Bethyl # A302-577A; V5, Abcam 
# ab27671). We removed unbound antibody and added the pre-cleared lysates 
to antibody-coupled beads and incubated overnight at 4°C. The following day, 
we washed the beads 3 times in IP wash buffer (50 mM HEPES pH 7.5, 300 mM 
KCl, 0.5% (v/v) NP40, 0.25 mM DTT, complete EDTA-free protease inhibitor 
cocktail), followed by one wash in FastAP buffer (10 mM Tris-HCl pH 8.0, 5 mM 
MgCl, 100 mM KCl, 0.02% Triton X-100). Immunopurified protein-RNA com- 
plexes were dephosphorylated by resuspending beads in 25 jl FastAP mix (18.5 jl 
H,0, 2.5411 10x FastAP buffer (Thermo Fisher Scientific), 2.5 U FastAP enzyme 
(1 U/l; Thermo Fisher Scientific), 0.5 4l Murine RNase Inhibitor (New England 
Biolabs)) and incubating for 20 min at 37°C. In the meantime, we prepared poly- 
nucleotide kinase mix (56 jl H2O, 1011 10x PNK buffer (New England Biolabs), 
11 Murine RNase Inhibitor, 7 1 T4 PNK (10 U/l; New England Biolabs), 1 1l 
TURBO DNase) and added 7511 to each 2511 sample and incubated 20 min at 
37°C. Beads were separated on a magnet and dephosphorylation reaction was 
removed before washing beads once in RNA ligation buffer without DTT (50mM 
Tris-HCl pH 7.5, 10mM MgCl,). Next, 3’ ligation was performed by resus- 
pending beads in 20 j1l ligation mix (3 jl H2O, 21 10x T4 RNA ligation buffer 
(New England Biolabs), 1 11 DMSO, 111 RNase inhibitor, 15 pmoles pre-adeny- 
lated 3’ adaptor, 1011 50% PEG 8000, 211 T4 RNA Ligase 1 High Concentration 
(New England Biolabs)) using low-retention pipette tips and incubated overnight 
at 16°C with agitation. The next day we added 7 11 4x NuPAGE LDS Sample Buffer 
(Thermo Fisher Scientific) to ligation reactions and incubated samples for 10 min 
at 75°C. Protein-RNA complexes were resolved by SDS-PAGE using NuPAGE 
4-12% Bis-Tris-HCl Gels (Thermo Fisher Scientific) at 200 V for 1 h, followed by 
transfer to a nitrocellulose membrane using the iBlot Dry Blotting System (Thermo 
Fisher Scientific). Protein-RNA complexes were visualized using the Odyssey Clx 
infrared imager (LI-COR) and desired complexes were excised from membrane 
using a clean scalpel. Membrane pieces were immediately subjected to proteinase 
K treatment by adding 25011 proteinase K solution (4 mg/ml Proteinase K (New 
England Biolabs), 100 mM Tris-HCl pH 7.5, 150 mM NaCl, 12.5 mM EDTA, 1% 
(w/v) SDS) and incubating 1 h at 55°C. Following proteinase K treatment, RNA 
was phenol-chloroform extracted using Heavy Phase Lock Gel tubes (5Prime) 
and purified with the Zymo RNA Clean & Concentrator-5 kit by following the 
manufacturer's instructions for small and large RNAs. We eluted RNA in 7 il 
H,0 and combined it with 10 pmoles of reverse transcription primer. Samples 
were heated to 72°C for 2 min and snap-cooled on ice. Reverse transcription was 
performed with the SuperScript HI kit (Thermo Fisher Scientific) by combin- 
ing RNA samples with 4,11 5x First Strand Buffer, 211 0.1M DTT and 6,1 2mM 
dNTPs. Samples were incubated at 50°C for 3 min before adding 1 11 SuperScript 
III reverse transcription enzyme (200 U/l) and incubating 1 h at 42°C. For each 
reverse transcription reaction, we washed 511 MyOne streptavidin C1 beads twice 
in NT2 buffer (50 mM Tris-HCl pH 7.5, 150 mM NaCl, 1 mM MgCl and 0.0005% 
NP40) and then resuspended the beads in 100,11 NT2 buffer. The resuspended 
beads were then added to the reverse transcription reaction and the mixture was 
subsequently incubated for 15 min at room temperature. Beads were washed twice 
in streptavidin wash buffer (20 mM Tris-HCl pH 7.5, 120 mM NaCl, 2 5mM KCl, 
5 mM EDTA, 1% Triton X-100, 1% sodium deoxycholate) and twice in PBS to 
remove unbound reverse transcription primer. Finally, we resuspended beads in 
101] of freshly prepared elution buffer (8.2511 H20, 1 jl 1M elution oligonucle- 
otide (CTGAACCGCTCTTCCGATCT), 0.75 tl of 50 mM MnCl) and heated 
the reaction for 5 min at 95°C, 1 min at 75°C, followed by a ramp of 0.1°C per s 
to 60°C and holding at 60°C for 15 min. Once the 60-°C incubation temperature 
was reached, we prepared CircLigase mix (211 HO, 211 10 CircLigase-II buffer 
(Epicentre), 0.25 pl 50mM MnCh, 411 5M betaine, 1 jul CircLigase-II (Epicentre)) 
and added 10,11 to each elution without removing beads. We incubated the reac- 
tion 2 h at 60°C. Following incubation, we added 2 reaction volumes Agencourt 
AMPure XP beads (Beckman Coulter) and 5 reaction volumes isopropanol and 
incubated 15 min at room temperature. Supernatant was removed and beads were 
washed once with 80% ethanol (v/v), air-dried and eluted by resuspending dry 
beads in 25 11 H,O, heating for 2 min at 95°C and isolating supernatants. We used 
121 cDNA for PCR amplification (program: 2 min at 98°C followed by 12-15 
cycles of 98°C for 15 s, 65°C for 30 s and 72°C for 30 s, followed by a final 20-min 
extension at 72°C) in a 501 reaction using 25 11 2x NEBNext Q5 Hot Start HiFi 
PCR Master Mix (New England Biolabs), 1211 H2O and 1 il of 251.M PCR1 primer 
mix containing P3_PCR1 (GCATTCCTGCTGAACCGCTCTTCCGATCT) 
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and P6_PCR1 (TTTCCCCTTGTGTGTGAAGCGAAGGGTA) primers. PCR 
reactions were subjected to two consecutive rounds of purification using 1.5 
volumes of Agencourt AMPure XP beads and two 70% ethanol washes. DNA 
was eluted in 14,11 H,O and subjected to a second PCR amplification (pro- 
gram: 2 min at 98°C followed by 3-6 cycles of 98°C for 15 s and 72°C for 45 s, 
followed by a final 2-min extension at 72°C) in a 50l reaction using 12 ul puri- 
fied PCR1, 25 ul 2x NEBNext Q5 Hot Start HiFi PCR Master Mix (New England 
Biolabs), 12,11 H2O and 1 yl of 251M PCR2 primer mix containing P3_PCR2 
(CAAGCAGAAGACGGCATACGAGATCGGTCTCGGCATTCCTGCTGAAC 
CGCTCTTCCGATCT) and P6_PCR2 (AATGATACGGCGACCACCGAGA 
TCTACACTCTTTCCCCT TGTGTGTGAAGC GAAGGGTA) primers. Upon 
completion of PCR amplification, we added 10 sl ExoSAP-IT PCR Product 
Cleanup Reagent (Thermo Fisher Scientific) and incubated reactions for 15 min 
at 37°C. Reactions were purified using 1.1 volumes of Agencourt AMPure XP 
beads and two 70% ethanol washes, followed by elution of air-dried beads in 1011 
H,0. The concentration of final libraries was determined with the Qubit dsDNA 
HS Assay (Thermo Fisher Scientific) and library sizes were analysed on a High 
Sensitivity DNA Bioanalyzer Chip (Agilent). 

Size-matched input libraries (SM input)°? were prepared by resolving 1-2% of 
input lysates by SDS-PAGE using NuPAGE 4—12% Bis-Tris-HCl Gels (Thermo 
Fisher Scientific) at 200 V for 1 h. SDS-PAGE gels were transferred to a nitrocel- 
lulose membrane using the iBlot Dry Blotting System (Thermo Fisher Scientific) 
and proteins migrating at the molecular weight range of the target protein were 
excised using a clean scalpel. RNA was released by proteinase K treatment and 
purified as described in the previous section. We performed end-repair of input 
RNA by adjusting RNA volume to 19.511 with H2O and adding 2.511 10x FastAP 
buffer (Thermo Fisher Scientific), 2.5 U FastAP enzyme (1 U/l; Thermo Fisher 
Scientific), 0.5 pl Murine RNase Inhibitor (New England Biolabs)) and incubating 
for 20 min at 37°C. In the meantime, we prepared Polynucleotide kinase mix (5611 
H,0, 10,1 10x PNK buffer (New England Biolabs), 1 jl Murine RNase Inhibitor, 
7 pl T4 PNK (10 U/l; New England Biolabs), 111 TURBO DNase) and added 
75 ul to each 25 11 sample and incubated samples for 20 min at 37°C. RNA was 
purified with the Zymo RNA Clean & Concentrator-5 kit using the manufacturer's 
instructions for small and large RNAs. RNA was eluted in 5 jl H,O and combined 
with 25 1] ligation mix (3 1l 10x T4 RNA ligation buffer (New England Biolabs), 
1.5.1 DMSO, 1.5 pl RNase inhibitor, 15 pmoles pre-adenylated 3’ adaptor, 1511 50% 
PEG 8000, 311 T4 RNA Ligase 1 High Concentration (New England Biolabs)) using 
low-retention pipette tips and incubated for 2 h at 23°C with agitation. Ligation 
reactions were purified to remove free 3’ adaptor using two consecutive Silane 
bead purifications. For each reaction, we washed 1511 Silane beads (Thermo Fisher 
Scientific) twice in 1 ml RLT buffer (Qiagen), resuspended beads in 90,11 RLT and 
combined 901] beads in RLT with 3011 ligation reaction. We added 0.7 volumes 
100% ethanol and incubated mixtures 10 min at room temperature. Supernatant 
was removed and beads were washed twice with 70% ethanol before eluting air- 
dried beads in 9,11 H2O. We used 7 1l of the eluted RNA for reverse transcription 
and proceeded with the library preparation as described in the above section. 
Computational analysis of CLIP data. We sequenced CLIP and corresponding 
SM input libraries on an Illumina HiSeq 2500 to an average read depth of 30-50 
million reads with 52-bp read 1 and 35-bp read 2. The first read includes a 6-nt 
barcode added during reverse transcription (see ‘CLIP’ above). After processing 
to separate samples based on inline barcodes, sequencing reads collected from all 
CLIP experiments were first mapped to hg19 using TopHat (v.2.0.8)"°. Reads align- 
ing to rRNA were removed from downstream analysis, as previously described*°. 
Duplicate reads were identified and removed using Picard’s MarkDuplicates pro- 
gram. Peak calling was performed with the MACS2" algorithm to identify genomic 
coordinates where experimental conditions (protein IP) were significantly enriched 
for reads relative to size-matched controls (SM input). Peak calling was performed 
without a shifting model and the band width to compute fragment size was set to 
100 bp. Significant peaks are reported with FDR correction of q=0.05. Significant 
peaks were further filtered to include only regions with an average minimum depth 
of two reads in the size-matched control condition. To compile significant results 
across replicate experiments, we intersected the intervals from the peak calling 
output of each replicate. Normalized coverage in the intersection peaks was first 
calculated separately for each replicate as the average depth at a given peak divided 
by the total number of reads after correcting for the observed duplication rate. The 
mean of the relative fold change between the two replicates was calculated for each 
peak and peaks that did not show a twofold or greater change in both replicates 
were excluded. We report a CLIP signal score for a given peak as the product of 
enrichment (average fold change) and the peak length (see Supplementary Table 3). 
RNA-sequencing and analysis. We performed RNA-sequencing on cells that stably 
expressed individual sgRNAs targeting the NORAD promoter, with or without 
doxycycline-induced KRAB-dCas9 expression. We performed at least 2 biolog- 
ical replicate experiments for knockdown and control conditions after 24 h, 48 h 
and 96 h of KRAB-dCas9 induction. RNA-sequencing libraries were constructed 
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as previously described. Reads were pseudo-aligned to hg19 (ENSEMBL tran- 
scripts) using kallisto” with an index of either 31 or 21 k-mers. Estimated counts 
were collapsed across transcripts into genes and differential expression analysis 
was performed using DESeq2**. Genes with an absolute log,(fold change) > 1 
and FDR < 0.05 were considered as differentially expressed. P values for differ- 
ential gene expression were corrected using the Benjamini-Hochberg procedure 
to derive an FDR. 

Alternative splicing analysis. Percentage spliced in (PSI) for different exons 
or introns was calculated using SUPPA2“* based on isoform transcripts per 
million (TPM) estimates from kallisto for skipping exon, alternative 5’ or 3’ 
splice sites, mutually exclusive exons, retained intron and alternative first or last 
exons. Differential PSI was calculated using diffSplice*® with the parameters “area 
1000-tpm-threshold 5-lower-bound 0.00 -gc. Events with a change in PSI > 20% 
and FDR < 0.05 were considered as differentially used. P values across putative 
splicing events were corrected using the Benjamini-Hochberg procedure to derive 
an FDR. 

RNA extraction and RT-qPCR. We extracted RNA from 20,000-50,000 cells 
per experiment in RLT buffer (Qiagen) using Dynabeads MyOne Silane beads 
(Thermo Fisher Scientific), treated samples with TURBO DNase (Thermo Fisher 
Scientific) and cleaned again with Silane beads. We used AffinityScript reverse 
transcriptase (Agilent Technologies) and random nonamer primers to convert 
RNA to cDNA. We performed qPCR using SYBR Green I Master Mix (Roche) 
and calculated differences using the AAC, method versus GAPDH. To achieve 
power to detect small effects in gene expression, we performed three technical 
qPCR replicates (from the same cDNA) and took the median value for fur- 
ther analysis. We used the following RT-qPCR primers in this study. RBMX 
forward primer: CAGTTCGCAGTAGCAGTGGA, RBMX reverse primer: 
TCGAGGTGGACCTCCATAAC; NORAD forward primer: CTCTGCTGT 
GGCTGCCC, NORAD reverse primer: GGGTGGGAAAGAGAGGTTCG; PUM2 
forward primer: GGGAGCTTCTCACCATTCAATG, PUM2 reverse primer: CCA 
TGAAAACCCTGTCCAGATC; GAPDH forward primer: AGCCACATCGC 
TCAGAC AC, GAPDH reverse primer: GCCCAATACGACCAAATCC; MALAT1 
forward primer: AGTTCAGTGTTGGGGCAATC, MALAT]I reverse primer: 
GTTCTTCCGCTCAAATCCTG; TOP1 forward primer: TCGAAGCGG 
ATTTCCGATTGA, TOP reverse primer: CTTTGTGCCGGTGTTCTCGAT. 
Co-IP western blot. Co-IP experiments were carried out as described above 
(see ‘Co-immunoprecipitation and MS’). The following antibodies were used 
for immunoprecipitation reactions: RBMX, Cell Signaling #14794, or Santa 
Cruz Biotechnology # sc-14581; ALYREF, Bethyl # A302-892A; TOP1, Bethyl # 
A302-589A; CDCSL, Bethyl #A301-681A. Following the last washing step, we 
resuspended beads in 20 11 Pierce IgG Elution Buffer (Thermo Fisher Scientific) 
and incubated them for 20 min at room temperature with agitation. We collected 
supernatants and added 7 tl 4x NuPAGE LDS Sample Buffer (Thermo Fisher 
Scientific), followed by a 3-min incubation at 95°C. Proteins were resolved by 
SDS-PAGE using NuPAGE 4-12% Bis-Tris-HCl Gels (Thermo Fisher Scientific) 
at 200 V for 1 h, followed by transfer to a nitrocellulose membrane using the iBlot 
Dry Blotting System (Thermo Fisher Scientific). Proteins larger than 150 kDa in 
size were resolved on NuPAGE 3-8% Tris-Acetate Gels (Thermo Fisher Scientific). 
Western blots were performed using the iBind Western System (Thermo Fisher 
Scientific). For protein detection, we used the following primary antibodies: 
RBMxX, Cell Signaling #14794, or Santa Cruz Biotechnology #sc-14581; ALYREF, 
Santa Cruz Biotechnology #sc-32311; TOP1, Santa Cruz Biotechnology # sc-32736; 
CDCSL, Santa Cruz Biotechnology #sc-81220. We used the following secondary 
antibodies: [RDye 680RD Goat anti-Mouse IgG (H + L) (LI-COR), IRDye 800CW 
Goat anti-Rabbit IgG (H + L) (LI-COR), IRDye 800CW Donkey anti-Goat IgG 
(H + L) (LI-COR). For visualization of bands, we used the Odyssey Clx infrared 
imager system (LI-COR). 

NORAD conservation analysis. We tested for conservation of NORAD tran- 
scription across 11 mammalian species: human, chimpanzee, gorilla, orangutan, 
rhesus macaque, mouse, rat, ferret, dog, cow and armadillo. Because expression of 
NORAD is highest in human brain** we checked for transcription in brain tissue 
from these 11 species. Raw RNA-sequencing read data were downloaded from pre- 
vious studies*”° and mapped to respective genomes using STAR v.2.5.2a™, with 
gene annotations from Ensembl Release 91°" as a reference guide. For each RNA- 
sequencing library, a de novo transcriptome was made using Stringtie v.1.3.3b™, 
using default parameters. Samples from the same species were then merged using 
the stringtie-merge option. To find reciprocal best hits, we used nucleotide BLAST 
with default parameters. Multiple sequence alignment was created using MAFFT*? 
with gap penalty reduced to 1.0. 

Subcellular fractionation. We prepared nuclear and cytoplasmic extracts from 
freshly grown HCT116 cells using the PARIS Kit (Thermo Fisher Scientific) by 
following the manufacturer’s instructions. We used ~10 million cells for each 
fractionation experiment and analysed extracts by western blot or RT-qPCR as 
described in the corresponding sections. 


smRNA FISH. smRNA FISH experiments were performed using the ViewRNA 
Cell Plus Assay Kit (Thermo Fisher Scientific) and following the manufacturer's 
instructions. We grew 50,000 cells in black 12-well glass-bottom plates (Cellvis) 
for 24 h. To induce DNA damage, we supplemented culture medium with 
doxorubicin (11M) or camptothecin (200 nM) for 12 h. We washed cells once 
with PBS before fixation. Cells were then fixed and permeabilized simultaneously 
in fixation/permeabilization buffer for 30 min at room temperature on a rotating 
plate. After three brief washes in PBS, we incubated cells with the appropriate probe 
set diluted 1:100 in probe set diluent for 2 h at 40°C, then with preamplifier mix 
at 40°C for 70 min, followed by amplifier mix at 40°C for 70 min, and finally label 
probe mix at 40°C for 60 min. For nuclei staining, we incubated the cells for 2 min 
at room temperature with 1x ViewRNA Cell Plus DAPI in PBS. Cells were then 
washed three times in PBS and then incubated with Alexa Fluor 647 phalliodin 
(Cell Signaling Technology) diluted 1:20 in PBS for 15 min at room temperature 
for staining of actin filaments. After a final set of washes, we covered cells with 
ProLong Gold Anti-Fade Reagent (Cell Signaling Technology) and stored the plates 
at 4°C until imaging. The probe sets and corresponding fluorophores were type 
1 - NORAD and MALAT]I (Alexa Fluor 546) and type 4 - GAPDH (Alexa Fluor 
488). Confocal microscopy was performed using a Nikon Eclipse Til with Andor 
Yokogawa Spinning Disk Revolution WD system. 

Quantification of RNA FISH images. For three-dimensional FISH image analysis, 
Z-stacks were exported such that the top and bottom slices were the beginning and 
end of DAPI signal in the z direction. Quantification of FISH foci was done with 
FISH-quant™ in MATLAB (version R2017b) following the software's instructions 
for mature mRNA quantification. Before spot detection, a dual Gaussian filter was 
applied to the images in FISH-quant using the default settings. The outline of nuclei 
and cells were determined automatically with the Cell Segmentation Tool in FISH- 
quant and a modified version of a Cell Profiler pipeline provided in the FISH-quant 
repository. In Cell Profiler (v.2.2.0)*°, nuclear boundaries were determined by the 
Otsu method guided by DAPI staining. Identified nuclei were then used as seeds 
to identify the boundaries of cells by the watershed method aided with the phal- 
loidin stain. For all probes, the local maximum strategy of spot pre-detection was 
used. Settings for thresholding pre-detected spots were optimized for each probe 
separately to account for differences in signal intensity. 

In situ proximity ligation assay. In situ proximity ligation assay (PLA) was per- 
formed using the Duolink PLA platform (Sigma) and following the manufacturer's 
instructions. Cells were plated in black, glass-bottom 96-well plates the day before 
the experiment and allowed to grow overnight at 37°C. Cells were fixed with 4% 
paraformaldehyde in PBS for 15 min at room temperature, washed three times in 
PBS and then permeabilized with 0.5% Triton X-100 in PBS for 15 min at room 
temperature. Cells were blocked for 1 h at 37°C in a humidified chamber using 
the Duolink blocking solution, and subsequently stained with primary antibodies 
diluted 1:250 in Duolink antibody diluent for 1 h at room temperature. DuoLink 
PLA probes (Rabbit PLUS and Mouse MINUS) were added for 1 h at 37°C. The 
ligation and subsequent amplification steps were performed for 30 min and 
100 min, respectively, at 37°C. Upon completion of the assay, cells were overlaid 
with Duolink mounting medium with DAPI. Two sets of primary antibody pairs 
were used: rabbit anti-RBMX (Cell Signaling Technologies #14794) was paired 
with mouse anti-TOP1 (Thermo Fisher Scientific #435900); mouse anti-Flag (Cell 
Signaling Technologies #8146S) (targeting Flag-RBMX) was paired with rabbit 
anti-TOP1 (Bethyl #A302-589A). 

Quantification of PLA images. For PLA signal quantification we used Cell Profiler 
3.0.0. Separate maximum intensity projections for each channel were exported. 
Nuclei and PLA-signal segmentation was performed using the minimal cross 
entropy thresholding method. We applied default settings for nuclei segmenta- 
tion, whereas the PLA signal detection required more stringent thresholding to 
distinguish individual spots within clusters. A size filter was applied to exclude 
overlapping nuclei from the analysis. The total nuclear PLA spot count was nor- 
malized to the total nuclear area for each cell. 

Immunostaining of cultured cells for anaphase nuclei imaging. We induced 
knockdown in CRISPRi cells with stably integrated sgRNAs by supplementing cell 
culture medium with 0.5 }1g/ml doxycycline for 48 h. Cells were then trypsinized 
and plated in multi-well glass-bottom plates (Cellvis), again supplementing culture 
medium with doxycycline in knockdown cells, and grown for an additional 24 h. 
We removed culture medium, rinsed each well in PBS and fixed cells in 4% para- 
formaldehyde (PFA) for 10 min at room temperature. All subsequent manipulation 
steps were carried out in a humidified chamber. PFA was removed, cells were 
washed twice in PBS and permeabilized by incubating with PBS + 0.1% Triton 
X-100 for 10 min at room temperature. Following permeabilization, we blocked 
cells in PBS containing 4% BSA (Roche), 10% goat serum (Sigma Aldrich) and 
0.1% Triton X-100 for 30 min at room temperature. Primary antibodies (anti-ca- 
tubulin—FITC antibody, Sigma Aldrich #F2168 (1:1,000); anti-centromere anti- 
bodies, Antibodies Incorporated # 15-234-0001 (1:200)) were diluted in blocking 
buffer (PBS containing 4% BSA (Roche), 10% goat serum (Sigma Aldrich) and 
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0.1% Triton X-100) and incubated in the dark for 2 h at room temperature or 
overnight at 4°C. Following antibody incubation, cells were washed 3 times in PBS 
+ 0.1% Triton X-100 and incubated 10 min at room temperature in between each 
washing step. Secondary antibody (Goat anti- Human IgG (H+L) Cross-Adsorbed, 
Alexa Fluor 568, Thermo Fisher Scientific #A-21090 (1:250)) was diluted in block- 
ing buffer and added to cells for 1 h at room temperature in the dark. Cells were 
washed 3 times in PBS + 0.1% Triton X-100 and incubated 10 min at room tem- 
perature in between each washing step. Following removal of any residual washing 
buffer, we covered cells with ProLong Gold Antifade reagent containing DAPI 
(Thermo Fisher Scientific) and allowed them to cure overnight in the dark before 
imaging. Confocal microscopy was performed using a Nikon Eclipse Til with 
Andor Yokogawa Spinning Disk Revolution WD system. 

Generation of NORAD trans rescue constructs and cell lines. We synthesized 
the full-length NORAD cDNA (Genewiz) and cloned it into pDONR221 (Thermo 
Fisher Scientific). Using Gateway technology, we cloned NORAD downstream of a 
CAG promoter in a destination vector expressing BFP linked by IRES to a hygro- 
mycin resistance cassette driven by an EFla promoter (ClonTech). 5/-truncated 
NORAD was generated by deleting bases 33-898 from NORAD in pDONR221 
using site-directed mutagenesis (Q5 Site-Directed Mutagenesis Kit, New England 
Biolabs). Sequence-verified 5’-truncated NORAD pDONR221 was cloned into the 
described destination vector using LR recombination. 

Sequence-verified rescue constructs were transfected into CRISPRi cells with 
stably integrated NORAD sgRNAs using FuGene HD (Promega) by following the 
manufacturer's instructions. We selected cells that stably integrated NORAD rescue 
constructs by selecting with hygromycin B (Sigma Aldrich) at a final concentration 
of 25 }1g/ml. Knockdown of endogenous NORAD was achieved by inducing KRAB- 
dCas9 expression in CRISPRi cell lines stably expressing sgRNAs targeting the 
endogenous NORAD promoter using doxycycline at 0.5 1g/ml. RT-qPCR Primers 
specific to the 5’ end of NORAD (forward primer: CTCTGCTGTGGCTGCCC, 
reverse primer: GGGTGGGAAAGAGAGGTTCG) or a middle segment of 
NORAD (forward primer: CTCTCCACCACCAACCTGATG, reverse primer: 
GGAAGTGAGATAACATCAGCTCTAA) were used to verify expression of 
full-length or 5’-truncated NORAD in cells depleted of endogenous NORAD. 
Cell-cycle analysis. Cell-cycle analysis was carried out by measuring EdU incor- 
poration and total DNA content. CRISPRi cells with stably integrated NORAD or 
RBMX sgRNAs and stably integrated rescue cassettes expressing different NORAD 
constructs (full-length NORAD, 5’-truncated NORAD or empty rescue cassette) 
were maintained in medium containing hygromycin B (Sigma Aldrich) at a final 
concentration of 12.51g/ml. Induction of KRAB-dCas9 and constitutive expression 
of rescue cassettes was routinely monitored by fluorescence-activated cell sorting. 
Medium supplemented with 0.5 1g/ml doxycycline was added to knockdown 
samples for 48 h. We then trypsinized cells and plated them in 24-well cell culture 
plates using 100,000 cells per well and incubated them for another 24 h in the 
presence of doxycycline. We labelled newly replicating DNA by supplementing cell 
culture medium with 10,1.M EdU for 1 h. Cells were washed with PBS, trypsinized 
and transferred to a 96-well round-bottom plate for improved handling of many 
samples in parallel. We used the Click-iT Plus EdU Flow Cytrometry Assay Kit 
(Thermo Fisher Scientific) and followed the manufacturer’s instructions with the 
following modifications. For improved multiplexing, we reduced the number of 
cells per assay by a factor of 10 (1 x 10° cells/ml) and scaled down washing volumes 
accordingly. The Click-IT reaction was performed using half the recommended 
reagent volumes per sample. After the last washing step, cells were resuspended in 
PBS containing FxCycle Far Red Stain (Thermo Fisher Scientific) as well as RNase 
Cocktail (Thermo Fisher Scientific) and incubated for 30 min at room tempera- 
ture to stain total DNA. Fluorescence-activated cell sorting was performed on a 
CytoFLEX S Instrument (Beckman Coulter). 

RNAi knockdown experiments. For RNAi knockdown experiments, we plated 
50,000 HCT116 cells 24 h before transfection into 24-well tissue culture plates using 
antibiotic-free medium. We transfected 50 nM short interfering RNAs (siRNAs) 
into each well using Lipofectamine RNAiMAX (Thermo Fisher Scientific) by 
following the manufacturer’s instructions. Medium was changed the day after 
transfections and cells were incubated with siRNAs for a total time of 72 h. Cell- 
cycle analysis and RT-qPCR were performed as described in the above sections 
(see ‘Cell-cycle analysis’ and “RNA extraction and RT-qPCR above). 

DNA combing. We induced knockdown in CRISPRi cells with stably integrated 
sgRNAs by supplementing cell culture medium with 0.5 }1g/ml doxycycline for 72 h. 
Knockdown and wild-type cells were labelled with a final concentration of 100}.M 
CldU for 70 min. CIdU-containing medium was removed, cells were washed twice 
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with warm medium (which did not contain any thymidine analogues) and then 
incubated with a final concentration of 100\1M IdU for 70 min. IdU-containing 
medium was removed, cells were washed twice with warm PBS and trypsinized. 
We counted cells in triplicates and used 75,000 cells for each experiment. Cells were 
embedded in agarose plugs using the FibrePrep DNA Extraction Kit (Genomic 
Vision) by following the manufacturer's instructions. DNA combing, immuno- 
detection, image acquisition and data analysis were performed at specialized ser- 
vice facilities. Only intact replication origins with positive DNA counterstaining 
were used to measure fibre length and calculate replication fork velocity. 
Reporting summary. Further information on research design is available in 
the Nature Research Reporting Summary linked to this paper. 

Code availability. Code for the analyses described in this paper is available from 
the corresponding authors upon request. 

Data availability. Sequencing data for this study are available at the Gene 
Expression Omnibus under the accession number GSE114953. The original mass 
spectra may be downloaded from MassIVE (http://massive.ucsd.edu) using the 
identifier: MSV000082561. The data are directly accessible via ftp://massive.ucsd. 
edu/MSV000082561. All other data are available from the corresponding authors 
upon reasonable request. 
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Extended Data Fig. 1 | RNA antisense purification of RMRP and 
NORAD transcripts. RT-qPCR measurements of RNA yield in RMRP 
and NORAD RAP MS experiments. Columns represent the mean of two 
biological replicate experiments, individual data points are shown. 
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Extended Data Fig. 2 | Subcellular localization of NORAD and analysis 


of NORAD-protein interactions with DNA damage. a, smRNA FISH 
of GAPDH, NORAD and MALAT1 in wild-type HCT116 cells. GAPDH, 
cytoplasmic reference; MALAT1, nuclear reference. Actin is stained 

with Alexa Fluor 647-conjungated phalloidin. Scale bar, 20 jm. Images 
are representative of one experiment; three independent experiments 
were performed. b, Quantification of smRNA FISH experiments. Circles 
show medians; box limits, 25th and 75th percentiles; whiskers, 1.5 
interquartile range; polygons, extreme values. Method 1: phalloidin- 
aided cell boundary detection using the watershed method. Method 2: 
proximity-based cell boundary detection using the distance — N method 


in Cell Profiler. Sample sizes: GAPDH method 1, n = 369; GAPDH method 


2, n= 369; NORAD method 1, n=299; NORAD method 2, n= 299; 
MALATI1 method 1, n= 229; MALAT1 method 2, n = 229. c, Subcellular 
fractionation of HCT116 cells. Lamin B2 and a-tubulin serve as controls 
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for nuclear and cytoplasmic fractions, respectively. Western blots are 
representative of one experiment; three independent experiments were 
performed. d, RT-qPCR measurements of relative RNA levels in nuclear 
and cytoplasmic extracts. Quantification relative to GAPDH. Percent 
nuclear extract is calculated relative to the total signal observed in nuclear 
and cytoplasmic fractions. Values are mean + standard deviation (n = 3). 
e, RT-qPCR measurements of NORAD expression upon doxorubicin, 
camptothecin or ultraviolet treatment in NORAD wild-type or knockdown 
cells. Quantification relative to GAPDH. Values are mean + standard 
deviation (n= 4). f, Western blot of NORAD RAP experiments with 

or without DNA damage. Western blots are representative of one 
experiment; three independent experiments were performed. g, RT-qPCR 
measurements of RNA yield in NORAD RAP experiments. Values are 
mean + standard deviation (n = 3). 
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Extended Data Fig. 3 | Analysis of NORAD knockdown, NORAD 
conservation and NORAD-protein interactions. a, RT-qPCR 
measurements of NORAD, RBMX and PUM2 CRISPRi knockdown and 
NORAD rescue experiments. Quantification relative to GAPDH. Values 
are mean + standard deviation (n = 3). b, Differentially expressed genes 
in RNA-sequencing experiments from NORAD CRISPRi knockdown 
cells. c, Quantification of chromosome segregation errors in PUM2 
wild-type or knockdown cells. One hundred anaphases were scored for 
each condition. Columns represent the mean of two biological replicate 
experiments, individual data points are shown. d, Histogram of RBMX- 
binding-site length in CLIP experiments. e, Multiple sequence alignment 
of NORAD transcripts, assembled de novo from RNA-sequencing data 
from 11 mammalian species. Only transcribed sequences are shown. Blue 
bar indicates RBMX-binding site in human NORAD. Alignment colour 


scheme: A, orange; C, blue; T, green; G, red. f, CLIP data plotted across 
NORAD RNA for RBMX, FUBP1, FUBP3 and PUM1. RBMX SM input 
library is shown. Representative alignments from two biological replicates 
are shown. g, RBMX RIP in nuclear and cytoplasmic fractions. The 
percentage of nuclear RIP signal is calculated relative to the total signal 
observed in nuclear and cytoplasmic fractions. h, Immunofluorescence 
imaging of RBMX in HCT116 cells. Scale bar, 201m. Representative 
images from three biological replicates are shown. i, Left, RT-qPCR 
measurements of NORAD RNA levels in nuclear and cytoplasmic 
extracts under RBMX CRISPRi wild-type or knockdown conditions. 
The percentage of nuclear NORAD is calculated relative to the total 
signal observed in nuclear and cytoplasmic fractions. Right, RT-qPCR 
measurements of RBMX CRISPRi knockdown. Quantification relative to 
GAPDH. Values are mean + standard deviation (n = 3). 
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Extended Data Fig. 4 | See next page for caption. 
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Extended Data Fig. 4 | Analysis of RBMX protein-protein interactions 
and their dependency on NORAD. a, Ranked list of NORAD-dependent 
RBMxX- interacting proteins identified by quantitative co-IP MS 
(Supplementary Table 4) and their respective rank in NORAD RAP 

MS experiments. b, Western blot of two independent NORAD RAP 
experiments with or without crosslink. Antibodies were pooled and 
incubated with the same membrane. Corresponding size regions were 
cropped for simplicity of presentation. c, Western blot of levels of TOP1, 
RBMxX, PRPF19, CDC5L, BRCA1 and BRCA2 proteins in NORAD wild- 
type and knockdown cells from two independent experiments. 8-actin 
serves as loading control. d, CLIP data plotted across NORAD RNA for 
RBMX and ALYREF. RBMX SM input library is shown. Representative 


alignments from two biological replicates are shown. e, Co-IP western blot 
for TOP1, ALYREF, CDC5L, RBMX and IgG control. Inputs are shown 

on the right. Western blots are representative of one experiment; three 
independent experiments were performed. f, Western blot of Flag-RBMX- 
V5 co-IP followed by size-exclusion chromatography. Fractions 1-9 are 
shown. Fractions 10-20 were not probed for PRPF19 owing to overlap 
with Flag antibody at this size range (Supplementary Note 4). g, RT-qPCR 
measurements of NORAD 5’ fragment (light grey) and full-length NORAD 
(dark grey) in rescue experiments using full-length and 5’-truncated 
NORAD rescue constructs. Measurements correspond to cells used for 
proximity ligation assays. Quantification relative to GAPDH. Values are 
mean + standard deviation (n= 6). 
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Extended Data Fig. 5 | Analysis of alternative splicing and cell-cycle 
progression in NORAD depleted cells. a, Venn diagram of significant 
splicing changes (percentage spliced in (PSI) > 20%; FDR < 0.05) in 
NORAD wild-type and knockdown cells at 24, 48 and 96 h (Supplementary 
Table 6); 89,352, 88,529 and 84,340 events were analysed at 24, 48 

and 96 h, respectively. Only six events were consistent between two 

time points and none were consistent between all three time points b, 
RT-qPCR measurements of NORAD 5’ fragment (light grey) and full- 
length NORAD (dark grey) in rescue experiments using full-length and 
5’-truncated NORAD rescue constructs. Measurements correspond to 
cells used in cell-cycle analysis. Quantification relative to GAPDH. Values 


are mean + standard deviation (n =5 or 6). c, RI-qPCR measurements of 
RBMX CRISPRi knockdown. Quantification relative to GAPDH. Values 
are mean + standard deviation (n =5). d, RT-qPCR measurements of 
TOP1 RNA interference knockdown. Quantification relative to GAPDH. 
Values are mean + standard deviation (TOP1 siRNA, n= 6; control siRNA, 
n=5). e, Representative fluorescence-activated cell sorting histograms 
measuring EdU incorporation and DNA content in RBMX and NORAD 
CRISPRi knockdown and NORAD rescue cells. Percentage of cells in each 
cell-cycle phase is indicated. f, As in e, but for TOP1 RNA interference 
knockdown cells. 
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Structures of filaments from Pick’s disease reveal a 


novel tau protein fold 


Benjamin Falcon!, Wenjuan Zhang!, Alexey G. Murzin!, Garib Murshudov!, Holly J. Garringer?, Ruben Vidal’, 
R. Anthony Crowther!, Bernardino Ghetti*, Sjors H. W. Scheres!** & Michel Goedert!** 


The ordered assembly of tau protein into abnormal filamentous 
inclusions underlies many human neurodegenerative diseases’. 
Tau assemblies seem to spread through specific neural networks 
in each disease’, with short filaments having the greatest seeding 
activity’. The abundance of tau inclusions strongly correlates with 
disease symptoms’. Six tau isoforms are expressed in the normal 
adult human brain—three isoforms with four microtubule-binding 
repeats each (4R tau) and three isoforms that lack the second repeat 
(3R tau)!. In various diseases, tau filaments can be composed of 
either 3R or 4R tau, or of both. Tau filaments have distinct cellular 
and neuroanatomical distributions*, with morphological and 
biochemical differences suggesting that they may be able to adopt 
disease-specific molecular conformations’. Such conformers may 
give rise to different neuropathological phenotypes®*”, reminiscent 
of prion strains!°. However, the underlying structures are not 
known. Using electron cryo-microscopy, we recently reported 
the structures of tau filaments from patients with Alzheimer’s 
disease, which contain both 3R and 4R tau!!. Here we determine 
the structures of tau filaments from patients with Pick’s disease, 
a neurodegenerative disorder characterized by frontotemporal 
dementia. The filaments consist of residues Lys254-Phe378 of 
3R tau, which are folded differently from the tau filaments in 
Alzheimer’s disease, establishing the existence of conformers of 
assembled tau. The observed tau fold in the filaments of patients 
with Pick’s disease explains the selective incorporation of 3R tau in 
Pick bodies, and the differences in phosphorylation relative to the 
tau filaments of Alzheimer’s disease. Our findings show how tau 
can adopt distinct folds in the human brain in different diseases, 
an essential step for understanding the formation and propagation 
of molecular conformers. 

We used electron cryo-microscopy (cryo-EM) to image tau filaments 
extracted from the frontotemporal cortex of a patient who had a 7-year 
history of behavioural-variant frontotemporal dementia (patient 4). 
Neuropathological examination revealed severe frontotemporal lobar 
degeneration, with abundant Pick bodies composed of 3R tau fila- 
ments, without phosphorylation of Ser262 (Fig. la-d, Extended Data 
Fig. 1, Extended Data Table 1), consistent with a diagnosis of Pick’s 
disease'*"!”, As in Alzheimer’s disease'®, a fuzzy coat composed of the 
disordered N- and C-terminal regions of tau surrounded the filament 
cores and was removed by pronase treatment (Fig. le and Extended 
Data Fig. 1). Narrow (93%) and wide (7%) filaments could be distin- 
guished (Fig. le). The narrow filaments have previously been described 
as straight’®-*, but they do have a helical twist with a crossover dis- 
tance of approximately 1,000 A and a projected width that varies from 
approximately 50 to 150 A. The wide filaments have a similar cross- 
over distance, but their width varies from approximately 50 to 300 A. 
We named them narrow and wide Pick filaments (NPFs and WPFs, 
respectively). Their morphologies and relative abundance match those 
reported in cortical biopsies from patients with Pick’s disease”!. 


Using helical reconstruction in RELION’, we determined a 3.2 A 
resolution map of the ordered core of NPFs, in which side-chain den- 
sities were well resolved and (-strands were clearly separated along the 
helical axis (Fig. 1f and Extended Data Fig. 2). We also determined a 
map of WPFs, which was limited to 8 A resolution because of the small 
number of filaments. This was sufficient to show separated 3-sheets 
within the structure, but not the 4.7 A spacing of individual 6-strands 
along the helical axis (Fig. 1g and Extended Data Fig. 3). NPFs are 
composed of a single protofilament with an elongated structure that 
is markedly different from the C-shaped protofilament of paired 
helical filaments (PHFs) and straight filaments found in Alzheimer’s 
disease''?. WPFs are formed by the association of two NPF protofila- 
ments at their distal tips. In support, we observed WPFs in which one 
protofilament had been lost in some parts (Extended Data Fig. 3). Our 
results reveal that the tau filaments of Pick’s disease adopt a single fold 
that is different from that of the tau filaments found in Alzheimer’s 
disease. 

The high-resolution NPF map allowed us to build an atomic model 
of the Pick tau filament fold (hereafter termed the Pick fold), which 
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Fig. 1 | Filamentous tau pathology of Pick’s disease. a, The brain used for 
cryo-EM (patient 4) showed atrophy of the anterior frontal and temporal 
lobes of the cerebral cortex. Scale bar, 5 cm. b-d, Staining of Pick bodies 
in the frontotemporal cortex of patient 4 using antibody RD3 (3R tau; 
brown) (b), but not by antibodies anti-4R (4R tau) (c) or 12E8 (pS262 tau 
and/or pS356 tau) (d). Nuclei were counterstained blue. Scale bars, 20 jum. 
e, Cryo-electron micrograph of tau filaments extracted from grey matter 
of the frontotemporal cortex of patient 4, in which narrow (NPFs; blue 
arrow) and wide (WPEFs; red arrow) Pick filaments could be distinguished. 
Scale bar, 500 A. f, Unsharpened cryo-EM density of NPF from patient 4. 
Scale bar, 25 A. g, Unsharpened cryo-EM density of WPF from patient 4. 
Scale bar, 25 A. 
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consists of residues Lys254—Phe378 of 3R tau (in the numbering of the 
441-amino-acid human tau isoform) (Fig. 2). There are nine 8-strands 
(81-89) arranged into four cross-B packing stacks and connected by 
turns and arcs. R1 provides two strands, 31 and 82, and R3 and R4 pro- 
vide three strands each, 33-5 and 86-8, respectively. These strands 
pack together in a hairpin-like fashion: 81 against 88, 32 against 37, 
83 against 36, and 84 against 35. The final strand, 89, is formed from 
nine amino acids after R4 and packs against the opposite side of 38. 
Only the interface between 83 and (6 is entirely hydrophobic; the other 
cross-B packing interfaces are composed of both non-polar and polar 
side chains. 

The interstrand connections and their interactions maintain the 
strand pairings and compensate for differences in strand lengths and 
orientations. A sharp right-angle turn at Gly261, between 61 and 82, 
faces a smooth four-residue arc formed by 355-Gly-Ser-Leu-Asp-358, 
between 37 and (8, turning the protein chain in the same direction. 
The 270-Pro-Gly-Gly-Gly-273 motif between 32 and 83 gives rise to an 
omega-shaped turn that compacts the protein chain locally, but main- 
tains its direction at either end. On the opposite side, a B-arc formed of 
Glu342 and Lys343, between 86 and (7, creates space for this turn. By 
contrast, the homologous 332-Pro-Gly-Gly-Gly-335 motif connecting 
85 and 86 forms an extended (3-spiral conformation, compensating for 
the shorter lengths of these strands compared to the opposing 33 and 
84, which are connected by Pro312. Solvent-mediated interactions may 
occur within the large cavity between this motif and the side chains at 
the junction of 83 and 64. The third homologous 364-Pro-Gly-Gly- 
Gly-367 motif contributes to a 180° turn that allows 89 to pack against 
the other side of 88. Variations in the height of the chain along the 
helical axis also help to maintain an ordered hydrogen-bonding pattern 
of the 6-stranded regions (Fig. 3c). 

The solvent-exposed side chains of Cys322 and Ser324, together 
with the intervening Gly323, form a flat surface at the hairpin turn 
between 84 and 85. This provides the interface for the formation of 
WPEs by the abutting of protofilaments (Extended Data Fig. 3). The 
distances between protofilaments at this interface would enable van 
der Waals interactions, but not the formation of disulfide bonds. In 
support, WPFs were stable under reducing conditions (Extended Data 
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Fig. 2 | The Pick tau filament fold. 

a, Sharpened, high-resolution cryo-EM 
map of the NPF with the atomic model 
of the Pick fold overlaid. b, Schematic 
view of the Pick fold. Amino acid 
numbering corresponds to the 
441-amino-acid human tau isoform, so 
residues 275-305 of R2 are not present. 


Fig. 3). At the resolution of the WPF map, it was not possible to deter- 
mine whether protofilament B-strands are aligned or staggered by one 
half of the B-strand distance, as is the case for PHFs in Alzheimer’s 
disease!!. We conclude that WPFs are formed by two separate pro- 
tofilaments making tight contacts at their distal tips through van der 
Waals interactions. 

Three regions of less well-resolved density bordering the solvent- 
exposed faces of 84, 85 and 89 are apparent in the unsharpened maps 
of both NPFs and WPFs (Fig. 1f, g). Their low resolution suggests that 
they represent less ordered, heterogeneous and/or transiently occu- 
pied structures. The density bordering 34 is similarly located, but more 
extended and less-well resolved, than that found to interact with the 
side chains of Lys317, Thr319 and Lys321 in Alzheimer’s disease PHFs 
and straight filaments'', and hypothesized to be the N-terminal 7-Glu- 
Phe-Glu-9, part of the discontinuous MC1 epitope”. NPFs and WPFs 
were labelled by MC1 (Extended Data Fig. 1). 

It was not previously known why only 3R tau, which lacks the second 
microtubule-binding repeat, is present in Pick body filaments. Our 
results suggest that despite sequence homology, the structure formed 
by Lys254-Lys274 of the first tau repeat is inaccessible to the corre- 
sponding residues from the second repeat of 4R tau (Ser285-Ser305). 
The site occupied by Thr263, which is in close proximity to the back- 
bone of the opposite strand 87, cannot accommodate the bulkier 
side chain of Lys294 from 4R tau and does not provide a favourable 
environment for its charged c-amino group (Extended Data Fig. 4). 
Also, the site preceding the omega-like structure formed by 270-Pro- 
Gly-Gly-Gly-273 cannot accommodate a Cg branched residue, such 
as Val300 from 4R tau instead of Gln269 from 3R tau. In addition, the 
smaller Cys291 residue from 4R tau would form weaker interactions 
with Leu357 and Ile360 than those formed by Ile260 of 3R tau. In sup- 
port, tau filaments extracted from the brain of the patient with Pick’s 
disease used for cryo-EM seeded the aggregation of recombinant full- 
length 3R, but not 4R, tau (Extended Data Fig. 5). Similar experiments 
have shown that Alzheimer’s disease PHFs and straight filaments, the 
core sequences of which are shared by 3R and 4R tau, can seed both 
types of isoform”. Such templated misfolding explains the selective 
incorporation of 3R tau in Pick body filaments. Pick’s disease extracts 
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Fig. 3 | Comparison of the Pick and Alzheimer tau filament folds. 

a, Sequence alignment of the microtubule-binding repeats (R1-R4) with 
the observed nine 6-strand regions (arrows) in the Pick fold and eight 
8-strand regions (arrows) in the Alzheimer fold, coloured from violet to 
red. b, Rendered view of the secondary structure elements in the Pick fold, 
depicted as three successive rungs. c, As in b, but in a view perpendicular 


have been reported to seed the aggregation of a 4R tau fragment com- 
prising the repeats (residues 244-372) with mutations Phe301Leu and 
Val337Met?°. However, this tau fragment cannot form the Pick fold, 
which is unable to accommodate R2 and requires residues 373-378. 
A small amount of aggregated four-repeat tau may have accounted 
for the seeding activity, as suggested in a separate study®. Loss of von 
Economo neurons in anterior cingulate and frontoinsular cortices has 
been reported to be an early event in Pick’s disease”””®. It remains to 
be established how 3R tau seeds can form in cells that also express 4R 
tau. Alternatively, nerve cell populations may be distinguished by the 
tau isoforms that they express”. 

The cryo-EM structures presented here were derived from filaments 
that were extracted from the brain of a single patient with Pick’s disease 
(patient 4). To test the generality of the Pick fold, we investigated the 
binding of repeat-specific antibodies to tau filaments extracted from the 
frontotemporal cortex of eight additional patients with sporadic Pick’s 
disease (Extended Data Table 1). By western blotting, all samples ran 
as two tau bands of 60 and 64 kDa, which were detected by anti-R1, 
anti-R3 and anti-R4 antibodies, but not by an anti-R2 antibody, show- 
ing the presence of only 3R tau (Extended Data Fig. 6). immunogold 
negative-stain electron microscopy showed that most filaments 
were NPFs, with a minority of WPFs, and were not decorated by the 
repeat-specific antibodies (Extended Data Fig. 7). This shows that the R1, 
R3 and R4 epitopes are inaccessible to the antibodies used, indicating that 
they form part of the ordered filament core. Alzheimer’s disease PHFs 
and straight filaments are decorated by anti-R2, but not by anti-R3 or 
anti-R4 antibodies, because their core is made of R3, R4 and the 10 amino 
acids following R4''~°. These results are in good agreement with exper- 
iments using limited proteolysis and mass spectrometry’. We conclude 
that the ordered core of tau filaments from Pick’s disease comprises the 
C-terminal part of R1, all of R3 and R4, as well as 10 amino acids after R4. 

Unlike PHFs and straight filaments in Alzheimer’s disease, Pick 
body filaments are not phosphorylated at Ser262 and/or Ser356!*1¢ 
(Extended Data Fig. 1). The reasons for this differential phosphoryla- 
tion are unknown. Our structure reveals that the tight turn at Gly261 
prevents phosphorylation of Ser262 in the ordered core of Pick’s disease 
filaments, whereas the phosphorylated Ser262 is outside the ordered 
core of Alzheimer’s disease filaments''. This explains the differential 
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to the helical axis, revealing the changes in height within a single molecule. 
d, Schematic of the secondary structure elements in the Pick and 
Alzheimer folds, depicted as a single rung. The positions of Cys322 
(yellow ‘C’) and Asp348 (red ‘D) in the two folds are highlighted. The 
asterisk and hash symbols mark conserved turns of homologous regions in 
the Pick and Alzheimer folds. 


phosphorylation and raises the question of whether phosphorylation 
at Ser262 may protect against Pick’s disease. 

In the Pick and Alzheimer tau filament folds, most 3-structure res- 
idues between Val306 and Ile354 align locally, as do the connecting 
segments of Pro312, 332-Pro-Gly-Gly-Gly-335 and 342-Glu-Lys-343 
(Fig. 3). Almost all amino acid side chains from this region have the 
same interior or solvent-exposed orientations in both folds. Exceptions 
are Cys322 and Asp348, which cause reversed chain directions in one 
or other fold (Fig. 3d). The side chain of Cys322 is interior in the 
Alzheimer tau filament fold, whereas it is solvent-exposed in the Pick 
fold. This enables the hairpin-like turn and the cross-8 packing of 34 
against 35. The side chain of Asp348 is interior in the Pick tau fila- 
ment fold, thereby maintaining the 8-structure from Lys343 to Ile354 
(87), whereas it is solvent-exposed in the Alzheimer fold, enabling the 
tight turn between 85 and 86, which, together with 84, gives rise to a 
triangular 3-helix conformation’. Such B-helices, previously thought 
to be important for propagation*!, are absent from the Pick tau fila- 
ment fold. The B-strands in Gly355—Phe378 align well in both folds, 
but have different cross-3 packing arrangements. The solvent-exposed 
side chains of 37 and 38 in the Alzheimer fold are interior in the equiv- 
alent strands of the Pick fold (88 and 89), because of different con- 
formations of the two turn regions in R4, 355-Gly-Ser-Leu-Asp-358 
and 364-Pro-Gly-Gly-Gly-367. The 355-Gly-Ser-Leu-Asp-358 motif 
makes a sharp right-angle turn at Gly355 in the Alzheimer tau filament 
fold, but a wide turn in the Pick fold. The same sharp turn is found at 
the homologous site in R1 in the Pick tau filament fold, whereas the 
same wide turn occurs at the homologous site in R3 in the Alzheimer 
fold (Fig. 3). This suggests that these semi-conserved turn structures 
may also be found in tau filament folds in other diseases. By contrast, 
the 364-Pro-Gly-Gly-Gly-367 motif adopts a new conformation in the 
Pick fold, which reverses the chain direction and is different from both 
the right-angle turn that this motif forms in the Alzheimer fold and the 
conformations of the homologous Pro-Gly-Gly-Gly motifs from the 
other repeats in both tau filament folds. The Pick and Alzheimer folds 
share similar secondary structure patterns, but different turn confor- 
mations result in distinct cross-$ packing. 

These findings show that the ordered cores of tau filaments from 
Pick’s disease adopt a single, novel fold of 3R tau, which is distinct from 
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the tau filament fold found in Alzheimer’s disease. This suggests that 
additional folds may be found in tauopathies with 4R tau filaments, 
such as progressive supranuclear palsy. Our results also suggest that 
single, disease-specific folds may exist in tauopathies with the same 
tau filament isoform composition, such as progressive supranuclear 
palsy and corticobasal degeneration, since identical tau sequences can 
adopt more than one fold. Conserved secondary structure motifs and 
markedly different conformations at turn residues in the Alzheimer and 
Pick tau filament folds may form the basis for structural diversity in tau 
protein folds from other neurodegenerative diseases. The structures of 
different tau filament folds will provide the basis for determining the 
binding sites of tau ligands and give a rationale for their interactions 
with inclusions in different tauopathies*”. 

The identification of disease-specific folds in the ordered cores of 
tau filaments establishes the existence of molecular conformers. This 
is central to the hypothesis that conformers of filamentous tau give rise 
to the clinical phenotypes that define distinct tauopathies, akin to prion 
strains. By revealing the structural basis for molecular conformers in 
specific diseases, our results pave the way to a better understanding 
of a wide range of diseases related to abnormal protein aggregation. 
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METHODS 


No statistical methods were used to predetermine sample size. The experiments 
were not randomized, and investigators were not blinded to allocation during 
experiments and outcome assessment. 

Extraction of tau filaments. Sarkosyl-insoluble material was extracted from the 
grey matter of frontal and temporal cortex of the patients’ brains, as described”. 
Approximately 6 g tissue was used for cryo-EM and 0.6 g tissue was used for 
immunolabelling. The pelleted sarkosyl-insoluble material was resuspended in 
50 mM Tris-HCl pH 7.4 containing 150mM NaCl and 0.02% amphipol A8-35 
at 250 11 per g tissue, followed by centrifugation at 3,000g for 30 min at 4°C. The 
pellets, containing large contaminants, were discarded. The supernatants were 
centrifuged at 100,000g for 30 min at 4°C. The resulting pellets were resuspended 
in buffer at 1511 per gram tissue for cryo-EM and 150 1 per g tissue for immuno- 
labelling. Pronase treatment was carried out as described for negative-stain EM* 
and cryo-EM"". Tau filaments were incubated with 0.4 mg ml“ pronase (Sigma) 
for 5 min at 21°C before being deposited on EM grids and plunge-frozen. 
Recombinant tau. Tau constructs lacking the BR136, anti-4R, BR135 and TauC4 
peptide sequences were cloned from pRK172 encoding wild-type ON4R or 2N4R 
tau using the QuikChange Lightning site-directed mutagenesis kit (Agilent), 
according to the manufacturer’s instructions. All recombinant tau proteins 
were expressed and purified as described**. 

Immunolabelling and histology. Western blotting and immunogold negative- 
stain EM were carried out as described previously**. For western blotting, 
samples were resolved on 420% or 10% Tris-glycine gels (Novex), and the primary 
antibodies were diluted in PBS plus 0.1% Tween 20 and 1% bovine serum albumin 
(BSA). BR136 is a polyclonal antibody that was raised against a synthetic peptide 
corresponding to residues 244-257 of tau. The peptide (200 1g), coupled to keyhole 
limpet haemocyanin using glutaraldehyde, was mixed 1:1 with Freund’s complete 
adjuvant and used to immunize white Dutch rabbits. Booster injections using 
200 1g of conjugated peptide mixed 1:1 with Freund’s incomplete adjuvant were 
given every 2 weeks for 10 weeks after the primary immunization. Antibodies were 
obtained 7 days after the final booster injection and affinity purified. Extended 
Data Fig. 6 shows that BR136 is specific for the C-terminal region of residues 
244-257. Neurohistology and immunohistochemistry were carried out as 
described previously*>. Brain sections were 8-j1m thick and were counterstained 
with haematoxylin. Detailed antibody information is provided in Extended Data 
Table 2. 

Whole-exome sequencing. Whole-exome sequencing was carried out at the 
Center for Medical Genomics of Indiana University School of Medicine using 
genomic DNA from the nine individuals with neuropathologically confirmed 
diagnoses of Pick’s disease, the tau filaments of which were used in Extended Data 
Figs. 6 and 7. Target enrichment was performed using the SureSelectXT human 
all exon library (V6, 58Mb, Agilent) and high-throughput sequencing using 
a HiSeq4000 (2 x 75-bp paired-end configuration, Illumina). Bioinformatics 
analyses were performed as described*®. Findings on MAPT, PSEN1 and APOE 
are presented in Extended Data Table 1. 

Seeded aggregation. Seeded aggregation was carried out as described*”, but 
with full-length wild-type tau protein and without the aggregation inducer hep- 
arin. Recombinant 0N3R and ONAR tau were purified as described™. Extracted 
tau filaments (1511 per g tissue) were diluted 1:10 in 10 mM HEPES pH7.4, 
200 mM NaCl; 21 was added to 98 j1l of 20j.M ON3R or ON4R recombinant 
tau in the same buffer with 10M thioflavin T in a black, clear bottom 96-well 
plate (Perkin Elmer). The plate was sealed and incubated at 37 °C in a plate reader 
(BMG Labtech FLUOstar Omega), with cycles of shaking for 60 s (500 r.p.m., 
orbital) followed by no shaking for 60 s. Filament formation was monitored by 
measuring thioflavin T fluorescence every 45 min using 450 + 10 nm excitation 
and 480 + 10 nm emission wavelengths, with an instrument gain of 1,100. Three 
independent experiments were performed with separate recombinant protein 
preparations. 

Electron cryo-microscopy. Extracted, pronase-treated tau filaments (311 at a 
concentration of approximately 0.5 mg ml ~') were applied to glow-discharged 
holey carbon grids (Quantifoil Au R1.2/1.3, 300 mesh) and plunge-frozen in 
liquid ethane using an FEI Vitrobot Mark IV. Images were acquired on a Gatan 
K2-Summit detector in counting mode using an FEI Titan Krios at 300kV. A 
GIF-quantum energy filter (Gatan) was used with a slit width of 20 eV to remove 
inelastically scattered electrons. In total, 52 movie frames were recorded, each 
with an exposure time of 250 ms using a dose rate of 1.06 electrons per A? per 
frame, for a total accumulated dose of 55 electrons per A’ at a pixel size of 1.15A 
on the specimen. Defocus values ranged from —1.7 to —2.8 1m. Further details 
are presented in Extended Data Table 3. 

Helical reconstruction. Movie frames were corrected for gain reference, 
motion-corrected and dose-weighted using MOTIONCOR2*. Aligned, non- 
dose-weighted micrographs were used to estimate the contrast transfer function 
(CTF) in Gctf*’. All subsequent image-processing steps were performed using 
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helical reconstruction methods in RELION 2.1°*°. NPFs and WPFs were picked 
manually and processed as separate datasets. 

NPF dataset. NPF segments were extracted using a box size of 270 pixels and an 
inter-box distance of 14 A. Reference-free 2D classification was performed using 
a regularization value of T= 2, and segments contributing to suboptimal 2D class 
averages were discarded. An initial helical twist of —0.73° was estimated from the 
crossover distances of NPFs in micrographs, and the helical rise was estimated to be 
ATA. Using these values, an initial 3D reference was reconstructed de novo from 
2D class averages of segments comprising an entire helical crossover. A first round 
of 3D classification, starting from the de novo initial model low-pass filtered to 40 A 
with local optimization of the helical twist and rise, and a regularization value 
of T=4 yielded a reconstruction in which individual 8-sheets perpendicular to 
the helical axis were clearly separated, but no structure was discernable along the 
helical axis. Subsequently, 3D auto-refinement with optimization of helical twist 
and rise and a regularization value of T= 10 was performed using the segments 
that contributed to the 3D class displaying 3-sheets. The resulting reconstruction 
showed clearly discernible 3-strand separation. 

An additional round of 3D classification with a regularization value of T= 10 
starting from the 5 A low-pass filtered map from the previous auto-refinement 
was used to select further segments for a final high-resolution refinement. In 
total, 16,097 segments contributed to the final map. The reconstruction obtained 
with this relatively small subset of the initial dataset matched lower-resolution 
reconstructions obtained with larger subsets of the data, indicating that image 
classification did not select for a specific structure from a conformationally heter- 
ogeneous dataset, but instead was successful in distinguishing the segments with 
high-resolution information from images of varying quality. This is in line with 
observations in single-particle analysis*'. Superimposing the selected segments 
onto the original micrographs further confirmed this. Image classification also did 
not separate filaments with variable twists; instead, RELION combines segments 
from filaments with variable twists into a single 3D reconstruction and reduces the 
corresponding blurring effects by only using the central part of an intermediate 
asymmetrical reconstruction for real-space helical symmetrisation””. We used a 
10% value for the corresponding helical z_percentage parameter 

Optimization of the helical twist and rise converged onto —0.75° and 4.78 A, 
respectively. Refinements with helical rises of multiples of 4.78 A all led to 8-strand 
separation, but in agreement with the observed absence of layer lines between 50 
and 4.7 A we were unable to detect any repeating patterns along the helical axis 
other than the successive rungs of 3-strands. 

The final NPF reconstruction was sharpened using standard post-processing 
procedures in RELION, resulting in a B-factor of —57 A? (Extended Data Table 3). 
Helical symmetry was imposed on the post-processed map using RELION helix 
toolbox”. Final, overall resolution estimates were calculated from Fourier shell 
correlations at 0.143 between the two independently refined half-maps, using 
phase-randomization to correct for convolution effects of a generous, soft-edged 
solvent mask”. The overall resolution estimate of the final map was 3.2 A. Local 
resolution estimates were obtained using the same phase-randomization proce- 
dure, but with a soft spherical mask that was moved over the entire map. 

WPF dataset. The WPF dataset was down-scaled to a pixel size of 3.45 A and 
segments were extracted using a box size of 180 pixels and an inter-box distance 
of 14 A. As with the NPF dataset, an initial 3D reference was reconstructed de 
novo from 2D class averages of segments comprising an entire helical crosso- 
ver. 3D classification was then performed to discard suboptimal segments. 3D 
auto-refinement of the best class with a regularization value of T= 4 and a fixed 
helical rise and twist of 4.7 A and —0.6°, respectively, led to a 3D structure with 
good separation of 3-sheets perpendicular to the helical axis, but no structure was 
discernible along the helical axis. The cross-section of this map clearly revealed 
the presence of two NPF protofilaments. To further improve the reconstruction, 
we also made an initial model by placing two NPF maps, rotated 180° relative to 
each other in the WPF reconstruction, and low-pass filtering the resulting map to 
60 A. After a second 3D auto-refinement starting from this model, the final WPF 
reconstruction had an estimated overall resolution of 8 A and was sharpened by 
specifying a B-factor of —200 A? (Extended Data Table 3). In total, 3,003 segments 
contributed to the final map. 

Model building and refinement. A single monomer of the NPF core was built 
de novo in the 3.2 A resolution reconstruction using COOT“’. Model building 
was started from the distinctive extended 3-spiral conformation of the 332-Pro- 
Gly-Gly-Gly-335 motif, neighbouring the large histidine side chains of residues 
329 and 330, and working towards the N- and C-terminal regions by manually 
adding amino acids, followed by targeted real-space refinement. This assignment of 
amino acids was unambiguous due to the clear densities of the three Pro-Gly-Gly- 
Gly motifs and the large aromatic side chains of Tyr310 and Phe346. The model 
was then translated to give a stack of three consecutive monomers to preserve 
nearest-neighbour interactions for the middle chain in subsequent refinements 
using a combination of real-space refinement in PHENIX“ and Fourier-space 
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refinement in REFMAC™. In the latter, local symmetry restraints were imposed to 
keep all 8-strand rungs identical. Because most of the structure adopts a B-strand 
conformation, hydrogen bond restraints were imposed to preserve a parallel, 
in-register hydrogen bonding pattern in earlier stages of the model building pro- 
cess. Side-chain clashes were detected using MOLPROBITY“ and corrected by 
iterative cycles of real-space refinement in COOT and Fourier-space refinement in 
REFMAC. The refined model of the NPF was rigid-body fitted into the WPF map. 
Separate NPF model refinements were performed against a single half-map, and 
the resulting model was compared to the other half-map to confirm the absence of 
overfitting. The final model was stable in refinements without additional restraints. 
Ethical review board and informed consent. The Indiana Alzheimer Disease 
Center studies were reviewed and approved by the Indiana University Institutional 
Review Board. Informed consent was obtained from the patients’ next of kin. 
Reporting summary. Further information on experimental design is available in 
the Nature Research Reporting Summary linked to this paper. 

Data availability. Cryo-EM maps have been deposited in the Electron Microscopy 
Data Bank (EMDB) under accession numbers EMD-0077 for NPF and EMD-0078 
for WPE The NPF refined atomic model has been deposited in the Protein Data 
Bank (PDB) under accession number 6GX5. Whole-exome sequencing data that 
support the findings of this study have been deposited in the European Genome- 
Phenome Archive (EGA), which is hosted by the EBI and the CRG, under accession 
number EGAS00001003106. 
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Extended Data Fig. 1 | Further characterization of the filamentous frontotemporal cortex of patient 4. b, Immunoblots with antibodies BR133 
tau pathology of Pick’s disease. a, Staining of Pick bodies in the (tau N terminus), BR134 (tau C terminus), RD3 (3R tau), anti-4R (4R tau), 
frontotemporal cortex of patient 4 by Bodian silver and antibody AT8 (pS202 and pT205 tau) and 12E8 (pS262 tau and/or pS$356 tau). 
ATS8 (pS202 and pT205 tau), but not by Gallyas—Braak silver. Nuclei c, Immunogold negative-stain electron micrographs with antibodies 
are counterstained blue in the AT8 panel. Scale bars, 20 jum. BR133, BR134, 12E8 and MC1 of NPFs and WPFs with and without 
b, c, Immunolabelling of the tau filaments extracted from the pronase treatment. Scale bar, 500 A. 
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Extended Data Fig. 2 | NPF structure. a, Fourier shell correlation curves the NPF reconstruction. d, Close-up views of the cryo-EM map with the 
between two independently refined half-maps (black line) and between atomic model overlaid. The top row shows the three Pro-Gly-Gly-Gly 
the cryo-EM reconstruction and refined atomic model (red line). b, Local (PGGG) motifs; the bottom row shows several amino acids with large side 
resolution estimates for the NPF reconstruction. c, Helical axis views of chains. 
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Extended Data Fig. 3 | WPF structure. a, Fourier shell correlation the protofilament interface are shown in the boxed out area. d, Cryo-EM 
curves between two independently refined half-maps. b, Local resolution images showing WPFs from patient 4 (false coloured red), in which 
estimates for the WPF reconstruction. c, WPF density at high (light grey) segments from one of the protofilaments have been lost. Scale bar, 500 A. 
and low (dark grey) threshold with densities for two NPFs overlaid (yellow __e, Negative-stain EM images of WPFs from patient 4 after incubation in 
and blue). The atomic models fitted to the NPF densities in the region of 100 mM dithiothreitol for 20 h. Scale bar, 500 A. 
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Extended Data Fig. 4 | Incompatibility of the Pick tau filament fold with lysine at position 294 in R2, instead of threonine at position 263 in R1, and 


AR tau. Atomic model of the Pick fold with the 4R tau sequence overlaid. valine at position 300 in R2, instead of glutamine at position 269 in R1, are 
The region formed by Lys254-Lys274 from R1 is replaced by the Ser285- highlighted with dashed red outlines. The minor discrepancy of weaker 
Val300 region from R2 in 4R tau. Residues that differ between these interactions of Cys291 of R2 with Leu357 and Ile360 compared with those 
regions of R1 and R2 are coloured orange. The major discrepancies of formed by Ile260 of R1 is highlighted with a dashed yellow outline. 
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Extended Data Fig. 5 | Seeded aggregation of full-length 3R, but not 4R, —_ (blue) recombinant tau after incubation with (triangles) or without 
tau by the sarkosyl-insoluble fraction from the brain of patient 4 with (circles) the sarkosyl-insoluble fraction from the frontotemporal cortex 
Pick’s disease. a, Coomassie-stained SDS-PAGE of the ON3R and ON4R of patient 4. The results are from three independent experiments using 
recombinant tau preparations used for seeded aggregation. Two additional separate recombinant protein preparations. The sarkosyl-insoluble 
recombinant tau preparations were performed with similar results. fraction from Pick’s disease brain efficiently seeded the aggregation of 3R, 
b, Thioflavin T fluorescence measurements of ON3R (red) and ON4R but not 4R, tau. RFU, relative fluorescent units. 
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Extended Data Fig. 6 | Immunoblot analysis of additional Pick’s obtained in two independent repeats. c, Immunoblots using the antibodies 
disease cases. a, Diagram of 2N4R tau showing the N-terminal inserts BR136, anti-4R, BR135 and TauCé4 of tau filaments extracted from the 
(N1 and N2), the repeats (R1-R4) and the epitopes of antibodies BR133 frontotemporal cortex of 9 patients with Pick’s disease (patient 4 was used 
(N terminus), BR136 (R1), anti-4R (R2), BR135 (R3), TauC4 (R4) and for cryo-EM). See Extended Data Table 1 for details of the patients with 
BR134 (C terminus). b, Immunoblots of epitope-deletion recombinant Pick’s disease. 


tau constructs with the antibodies shown in a. Identical results were 
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Extended Data Fig. 7 | Immunogold negative-stain EM analysis of patients with Pick’s disease. b, Table summarizing results from a. Results 
additional Pick’s disease cases. a, Representative immunogold negative- for patient 4 (shown in a and used for cryo-EM) are highlighted in yellow. 


stain electron microscopy with antibodies BR133 (tau N terminus), BR136 See Extended Data Table 1 for details of the patients with Pick’s disease. 
(tau R1), anti-4R (tau R2), BR135 (tau R3), TauC4 (tau R4) and BR134 (tau = Tick marks indicate antibody decoration of filaments, and crosses indicate 
C terminus) of NPFs and WPFs extracted from the frontotemporal cortex _ that the antibodies did not decorate filaments. NPFs and WPFs were 

of patient 4. Scale bars, 100 nm. Similar results were obtained with tau decorated by the antibodies against the N and C termini of tau, but not by 
filaments extracted from the frontotemporal cortices of eight additional the tau repeat-specific antibodies. 
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Extended Data Table 1 | Summary of the patients with Pick’s disease 


Patient Gender Age at death (years) MAPT  PSENI  APOEhaplotypes Post-mortem interval (h) 


1 F 73 wt wt €3/e3 23.8 
2 F 70 wt wt €3/e3 20.5 
3 M 61 wt wt €3/e4 24.5 
4 FO 6B wt wt ee 
5 M 70 wt wt €3/e3 14.0 
6 M 65 wt wt €3/e3 125 
7 M 64 wt wt €3/e4 3.0 
8 M 56 wt wt €3/e4 9.0 
9 M 69 wt wt €3/e4 4.5 


Wild-type (wt) means that no known disease-causing mutations in the tau gene (MAPT) or the presenilin-1 gene (PSEN1) were detected. The patient used for cryo-EM is highlighted in yellow. 
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Name Epitope Supplier Cat. Species Type WB EM THC Validation 
number dilution dilution dilution 
BR133 N-terminus Inhouse - Rabbit Polyclonal 1:4,000 1:50 - 
BR134 C-terminus In house - Rabbit Polyclonal 1:4,000 1:50 - 30 
BR136 RI In house - Rabbit Polyclonal  1:4,000 1:50 - Extended Data 
Fig. 6 
Anti- R2 Cosmo CAC- Rabbit Polyclonal —§ 1:2,000 1:50 1:100 Manufacturer’s 
4R Bio TIP- datasheet and 
4RT- Extended Data 
PO1 Fig. 6 
BR135 R3 In house - Rabbit Polyclonal  1:4,000 1:50 - 3° and 
Extended Data 
Fig. 6 
Tau R4 Masato - Rabbit Polyclonal 1:2,000 1:50 - 7 and Extended 
C4 Hasegawa Data Fig. 6 
RD3 R1/3 Millipore 05-803 Mouse Monoclonal 1:4,000 - 1:3,000 Manufacturer’s 
datasheet 
12E8  pS262and/or Peter . Mouse Monoclonal 1:100,000 1:50 11,000 *748 
pS356 Seubert 
AT8 pS202 and Thermo MN1020 Mouse Monoclonal 1:1,000 1:50 1:300 Manufacturer’s 
p1205 datasheet 
MCI Discontinuous Peter - Mouse Monoclonal - 1:10 - - 
epitope Davies 


(residues 7-9 
and 313-322) 


IHC, immunohistochemistry; EM, immunogold negative-stain electron microscopy; WB, western blot. 
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Extended Data Table 3 | Cryo-EM data collection, refinement and validation statistics 


NPF WPF 
(EMDB-0077) (EMDB-0078) 
(PDB 6GXS5) 
Data collection and processing 
Magnification x105,000 x105,000 
Voltage (kV) 300 300 
Electron exposure (e—/A’) 55 55 
Defocus range (um) -1.7 to -2.8 -1.7 to -2.8 
Pixel size (A) 1.15 3.45 
Symmetry imposed None C2 
Initial particle images (no.) 83,475 8,024 
Final particle images (no.) 16,097 3,003 
Map resolution (A) 3.2 8 
FSC threshold 0.143 0.143 
Map resolution range (A) 3.2 to 3.5 8 to 12 
Helical rise (A) 4.78 4.7 
Helical twist (°) -0.75 -0.6 
Refinement 
Initial model used (PDB code) de novo n/a 
Model resolution (A) 3.2 n/a 
FSC threshold 0.5 
Model resolution range (A) 3:2, n/a 
Map sharpening B factor (A*) -57 -200 
Model composition n/a 
Non-hydrogen atoms 2133 
Protein residues 282 
Ligands 0 
B factors (A’) n/a 
Protein 75.8 average 
Ligand n/a 
R.m.s. deviations n/a 
Bond lengths (A) 0.0175 
Bond angles (°) 1.1678 
Validation n/a 
MolProbity score 21S 
Clashscore 5.96 
Poor rotamers (%) 4.94 
Ramachandran plot n/a 
Favored (%) 95.65 
Allowed (%) 100 
Disallowed (%) 0 
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Space firm Rocket Lab, whose launch site on New Zealand’s Mahia peninsula is shown here, typifies the country’s growing expertise in space technology. 


A land poised for lift-off 


New Zealand’s embrace of science promises to make it a rewarding career destination. 


BY JAMES MITCHELL CROW 


t high tide, the Cook Strait — the 
At of water separating New 

Zealand’s North and South Islands — 
is still and calm. Aboard the research vessel 
Tangaroa, spatial ecologist Kim Goetz seizes 
the moment before the tide turns, and sends 
an acoustic recorder plunging downwards. 
The recorder will spend the next six months 
moored to the seabed, capturing the calls of 
nearby marine mammals. 

Coordinating New Zealand's first large-scale 
deployment of acoustic recorders across the 
strait was one of Goetz’s first tasks after joining 
New Zealand's National Institute of Water and 


Atmospheric Research (NIWA) in Wellington. 
“More than half the world’s species of whales 
and dolphins come through New Zealand 
waters, and there’s next to nothing known 
about them,” says Goetz, who left her native 
United States in 2014 to join NIWA. A canyon 
along the strait’s sea floor potentially makes it a 
popular hangout for deep-diving species. 
Using the data collected from the recorders, 
Goetz has confirmed that the region is a 
hotspot for rare mammals, including elu- 
sive beaked whales (belonging to the family 
Ziphiidae). She says that the whales’ deep-sea 
habitat had made it difficult for researchers 
to learn much about them except when they 
wash ashore — until now. The team has also 


deployed acoustic moorings farther north, to 
monitor critically endangered Maui dolphins. 
NIWA and other New Zealand research 
organizations have seen sustained funding 
growth in the past decade, as the country 
has increasingly focused on supporting basic 
and applied science. In 2008, the government 
appointed its first chief scientific adviser, a role 
that University of Auckland biochemist Juliet 
Gerrard assumed on 1 July this year. Ger- 
rard says that she hopes to encourage ambi- 
tion in young female scientists. The nation’s 
federal science expenditure has increased 
by more than 70% over the past decade to 
NZS$1.58 billion (roughly US$1 billion) and 
the number of full-time equivalent research | 
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> positions has grown by about 23% since 
2010, to around 25,000. 

A new national government, elected in 
late 2017 and headed by progressive Prime 
Minister Jacinda Ardern, has yet more- 
ambitious plans. Among its policies is making 
New Zealand a net-zero carbon emitter by 
2050. Existing technology wont achieve that 
aim, says Peter Crabtree, general manager 
for science and innovation at the country’s 
Ministry of Business, Innovation and Employ- 
ment. One specific target, which could further 
expand researchers’ employment opportuni- 
ties, is to boost research-and-development 
investment from 1.28% of gross domestic 
product to 2% within a decade, bringing 
it closer to the 2.38% average for member 
nations of the Organisation for Economic 
Co-operation and Development. 

“There's a strong focus on increasing the 
proportion of investment from business,” 
Crabtree says. In the new government's first 
annual budget, NZ$1 billion was set aside 
over the next four years for an R&D tax incen- 
tive. The strategy to boost business R&D also 
emphasizes the need to attract innovative 
international businesses into the country — 
an approach that is already yielding results, 
including commercial-satellite launches. 

The nation’s distance from populous 
research hubs, its high cost of living and a sci- 
entific workforce that is only now beginning 
to grow might prove insurmountable obstacles 
for some, however. 


ROOM TO EXPLORE 

The nation’s research community has a 
collaborative, can-do attitude — as Goetz’s 
Cook Strait survey attests. “'m not an acous- 
tician — in the United States, nobody would 
have asked me to do that kind of thing,” says 
Goetz. But in New Zealand, whose total pop- 
ulation is less than that of Sydney, Australia, 
there is a smaller pool of researchers to draw 
from. “You can dabble in a lot of different 
areas,” she says. “You are always learning.” 


“There's fewer of us around, so we have to 
cover more bases,’ agrees Charles Eason, chief 
executive of the Cawthron Institute in Nelson, 
the country’s largest independent science 
organization. The Cawthron aims to support 
the country’s economy through science while 
preserving the natural environment — in 
which New Zealand’s powerful indigenous 
Maori traditions are deeply rooted. “Our 
Maori culture plays through our psyche,” 
Eason says. “Maori culture is very strong in 
terms of environmental protection.’ 

Nearly 15% of New Zealand’s population 
identify as Maori, yet Maori hold only 5% of 
academic research positions, says Anne-Marie 
Jackson, an indigenous New Zealander who 
researches Maori physical education, health 
and world view at the University of Otago. Still, 
she is seeing aspects of Maori culture seep into 
research culture. “The way we think about the 
environment has shifted,’ Jackson says. Rather 
than considering a complex ecosystem in small 
pieces, researchers now try to take a whole- 
system approach. “That aligns with one of our 
core principles of kaitiakitanga, a holistic way 
of looking at anything,” Jackson says. 


LESS IN THE POT 

One downside of a small workforce and 
employer base is that fewer grants are avail- 
able than in a larger nation. Nick Golledge, 
an ice-sheet modeller and climate researcher 
at the Antarctic Research Centre at Victoria 
University of Wellington, says that he often 
loses talented graduate students as they come 
closer to earning their PhD, because he can't 
find postdoctoral funding for them. “It’s a 
small country, and the overall amount of 
money is small,” says Golledge. 

Others warn that research positions in 
highly specialized disciplines might not 
be available. “It is not easy to plan to come 
here,’ says Olaf Morgenstern, a meteorolo- 
gist at NIWA. “If you have a speciality that is 
not widespread, you might struggle to find a 
position.” Morgenstern had no plans to live in 


GOOD T0 GO 


What to consider if you’re moving to New Zealand 


Scientists born overseas need a visa to live 
and work in New Zealand (see go.nature. 
com/nz_visas for details). Here’s a summary 
of the main categories available. 

@ Some science-related jobs, including 
postdoctoral researcher and university 
lecturer, are on New Zealand’s Immediate 
Skill Shortage list. Those offered a job on 
this list can apply for the Essential Skills 
work visa, which grants temporary residency 
(see go.nature.com/nz_essential). 

@ Some science-related jobs, including 
environmental research scientist and food 
technologist, are on the Long Term Skill 
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Shortage list. Scientists who are granted 

a Long Term Skill Shortage list work visa 
can apply for permanent residency (see 
go.nature.com/nz_longterm). 

@ New Zealand also has a skilled-migrant 
visa category, which grants permanent 
residency (see go.nature.com/nz_migrant). 
This visa class is points-based, with 
applicants accruing points on the basis of 
criteria such as age, skills, qualifications and 
experience. Applicants with qualifications 
on the Long Term Skill Shortage list, and 
applicants who have worked in New Zealand 
ona temporary visa, gain extra points. J.M.C. 
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New Zealand, but he decided to move there 
from the United Kingdom when the right job 
happened to come up. 

Goetz arrived in New Zealand after a similar 
decision, and says that only in hindsight did 
she realize how fortunate she was to land a per- 
manent position. “I took it for granted coming 
out here,” she says. “Since then, I’ve had several 
friends that had a job ona year term, and they 
have to leave because they cant find another 
job. I think that’s more common than not.” 

The country’s remote location can make 
itself felt in the cost of living. And property 
prices have risen sharply in the past decade 
— especially in Auckland, a global property 
market hotspot. Wellington, Christchurch and 
Dunedin were also ranked “severely unafford- 
able” in a 2018 international survey, meaning 
that houses there cost, on average, more than 
five times the median household income. 

Still, the diversified research landscape 
may beckon for some. New Zealand is also a 
geological hotspot, thanks largely to the fault 
line between the Pacific and Australian plates 
that runs up the country’s spine. “Earthquake 
research, the geomorphology of the landscape 
— New Zealand is a wonderful laboratory for 
that kind of work,” says James Metson, deputy 
vice-chancellor for research at the University 
of Auckland. 


SPACE RACE 

The government's 2015 analysis of the nation’s 
science strengths identified engineering, com- 
puter science, energy research, and physics 
and astronomy as particular specialisms — 
and ones that could offer a strong return from 
increased investment. Dividends are already 
paying off in one of those areas. In 2016, just 
five months after setting up a national space 
agency, New Zealand's parliament created a 
regulatory framework for space launches. And 
in January 2018, the US-New Zealand com- 
pany Rocket Lab launched its first commercial 
payloads into orbit. 

Although most of its funding comes from 
US venture capitalists, Rocket Lab was founded 
in Auckland in 2006 by New Zealand scientist 
and engineer Peter Beck. Its signature Elec- 
tron rocket, developed and built in Auckland, 
has been designed to capitalize on the soaring 
demand for sending small satellites into space. 

Scientists who have been in New Zealand 
for some time have few complaints. “A lot of us 
coming in realize we will probably never own 
a house,’ says Goetz. But there are advantages 
to the New Zealand lifestyle, she says. “People 
here really have a value for personal time, and 
for family time,” she says. “And there’s a lot of 
open spaces, beautiful scenery and things to 
get out and do” 

“You could perceive our distance as a 
disadvantage,” says Eason. “But people are 
pretty keen to come here.” m 


James Mitchell Crow is a freelance science 
writer based in Melbourne, Australia. 


Ua SCIENCE FICTION 


BY STEWART C. BAKER 


1. PLAN FOR THE LONG TERM 

Once, the ruin was a city called Toronto, 
and its buildings spread horizon to hori- 
zon, a safe habitat for people like Jeydah. 

Now it is only the ruin: a maze of 
collapsed masonry and broken machines. 
Jeydah visits once a month to gather 
salvage and pick berries from the sprawl- 
ing brambles that grow in its once- 
proud avenues. The people in the plains 
villages think her strange, but they barter 
with her all the same, and she stays fed on 
the tools that she finds, the gadgets she 
makes. 

The best salvage, though, Jeydah keeps 
for herself: a sphere that gives off light; a 
pane of glass you can heat with a touch; 
a small black block that speaks stories of 
the way the world once was. She keeps 
these treasures in a weatherproof sack, 
safely hidden in a hollow tree half-a-day’s 
walk from the ruin. 

Or perhaps not so safely, because when 
she stops at her tree one chill autumn day, 
the sack is missing, her treasures gone. The 
breaks in the long stalks of grass nearby look 
recent, so Jeydah hikes on without stopping. 
No matter how badly she wants her things 
back, few people in the wastes are kind to 
strangers. 

Its dark when she reaches the ruin, and 
for the first time in all her visits, she notices 
something new. Light — pale and steady as 
her stolen sphere’s — pooled inside a narrow 
arch at the ruin’s outermost edge. 


2. ELIMINATE BARRIERS TO ACCESS 

The arch leads to a stairwell, which leads to 
a warren of tunnels. As Jeydah walks, arrows 
light the floor, pointing her down a path she 
would never have found without them. 

At any other time, she would have mar- 
velled at this ancient tech, but tonight her 
shoulders ache with tension. Although she 
has not encountered whoever stole her 
things, she saw dust rising from the dry 
road behind her as she fled her tree. She jogs 
through the tunnels as fast as she dares, hop- 
ing her pursuer is more timid than she is. 

After countless twists and turns, the 
arrows dead-end against a wall — solid, 
smooth and indistinguishable from any 
other in this place. Jeydah spins around, 
throat tight, but only manages three steps 
before the tunnels plunge into darkness. 

At least she knows she wasn't followed. 

She laughs once, bleakly, then feels her 


FAILSAFES 


Be prepared. 


way back to the dead end and leans against 
it, head bent, eyes closed. When the wall dis- 
solves, she falls right through. 


3. BREAK DOWN COMPLEX TASKS 

The room behind the wall is large and open, 
suffused with a pale blue glow that leaves 
many of its features in shadow. More arrows 
lead Jeydah to the far wall, which whirrs 
open to reveal a bowl. The smell of chillies 
and roasted grain meets her nose, and her 
stomach growls. 

She should be cautious, but she’s tired 
and hasn‘’t eaten in nearly a day. Besides, 
shed be dead already if whoever built this 
place wanted it. She lifts the bowl from its 
platform. 

The arrows change, leading her to a chair 
lurking in the shadows. The instant she sits, 
the wall flickers to life with staticky symbols. 
She can't read them, of course, but there’s a 
picture of a person in the chair pushing a 
button and that seems clear enough. 

She looks down, finds the button, and 
pushes it. 

The lights in the room increase, and the 
wall displays more symbols, more pictures: 
the person in the chair retrieves objects 
from the wall, puts them together to build 
tool after tool. The person multiplies until 
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Jeydah stares, entranced, until the wall 
flickers and shows a single picture of the 
archway, a black panel and a lightning 
bolt that’s cut in half. 

The wall whirs open, revealing a panel. 
Jeydah picks it up, looks at the picture on 
the wall, and heads into the labyrinth. 


4, ALWAYS HAVE A BACK-UP 

Jeydah takes a few hesitant steps into the 
tunnels, relaxing only when she sees the 
arrows appear beneath her feet, pointing 
her onwards. 

After that, it takes no time at all to 
reach the stairs and climb to the surface, 
where the Moon hangs low and full above 
the ruin. Not much longer to clear the 
roof behind the arch of the debris that 
clutters it. 

Black panels glimmer in the moon- 
light, one of which is cracked. She pulls 
it free and replaces it with the new board, 
then slips back below the surface. 


5. MAKE PAST RESULTS REPRODUCIBLE 

As she follows the arrows back through the 
tunnel, Jeydah hears hiccupping sobs, high- 
pitched and desperate, that seem to come 
from everywhere at once. 

In all her explorations of the ruin, 
Jeydah’s never once met a ghost, so she 
squares her shoulders and follows the sound 
to a small, hungry-looking girl, hunched 
over a sack that looks awfully familiar and 
crying her eyes out. The child looks up as 
Jeydah approaches, clutches the sack to her 
bone-thin body, and scurries back against 
the tunnel wall, chin held up defiantly. 

If Jeydah had found the little thief at her 
clearing, she would have snatched the sack 
back in an instant. Fought the girl if she 
had to. Harsh, but the wastes are harsh. The 
world is. 

After the wonders of the labyrinth, 
though, after the secret room with its hot 
food and its worlds-ago tech, that seems 
petty. Unjust. She remembers the pictures 
on the wall. Remembers how hungry she 
was as a child. 

She squats down next to the girl, relaxes 
her shoulders, puts on her most disarming 
grin. “Hey,” she says. “You want something 
to eat?” = 


Stewart C. Baker is an academic librarian, 
haikuist and speculative-fiction writer based 
in Oregon. His fiction has appeared in Daily 
Science Fiction, Flash Fiction Online, 
Nature and other magazines. 
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